Take Home Exercise 03

Building a hedonic pricing models to explain factors affecting the resale prices of Singapore public housing.

Kwek Yi Chen https://kwekyichen-is415.netlify.app/posts/2021-10-27-takehome-ex03/
10-27-2021

1. Introduction

Housing is essential for everyone and is a major investment for many people in Singapore. Structural and location factor affects the price of housing. In this exercise, we are interested to find out which factor affects the price of 4-room HDB flat transacted from 1st January 2019 to 30th September 2020. We will be building a hedonic pricing model using GWR methods in the exercise.

2. Setting Up the Environment

We need to first set up the environment by install the relevant packages and set up the token.

2.1 Installing Packages

The following code chunk install the following packages:

Show code
packages = c('readr', 'tmap', 'sf', 'onemapsgapi', 'httr', 'dplyr', 'stringr', 'jsonlite', 'tidyr', 'sp', 'dummies', 'geojsonio', 'ggplot2', 'ggpubr', 'GWmodel', 'corrplot', 'olsrr', 'spdep')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
    }
  library(p,character.only = T)
}

2.2 onemapsgapi Token Authorization

Next, we set up the token for onemapsgapi. The code is not shown because the information is sensitive.

3. Data

4. Data Preparation

We will be preparing the data in this section. Note that during the data preparation, files will be written in data/newaspatial as a backup. After data preparation, these backup files will be deleted deleted due to Github’s file limit.

4.1 HDB Resales Data

The following code chunk uses read_csv function of readr package to import the hdb resales data and display the data.

Show code
hdbresale <- read_csv("data/aspatial/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")
hdbresale

The data contains multiple rows but our focus is only on four-room flat with transaction period from 1st January 2019 to 30th September 2020.

4.1.1 Filter the relevant data

The following code chunk filters the four-room flat data from Jan 2019 to September 2020 using filter function of base package.

Then, we created a new column using mutate function of dplyr package for the following column:

Show code
hdbresale_filtered <- hdbresale %>%
                filter(month %in% c("2019-01", "2019-02", "2019-03", "2019-04", "2019-05", "2019-06", "2019-07", "2019-08", "2019-09", "2019-10", "2019-11", "2019-12","2020-01", "2020-02", "2020-03", "2020-04", "2020-05", "2020-06", "2020-07", "2020-08", "2020-09")) %>%
                filter(flat_type == "4 ROOM")

hdbresale_new <- hdbresale_filtered %>%
                  mutate(hdbresale_filtered, address = paste(as.character(block),as.character(street_name))) %>%
                  mutate(hdbresale_filtered, storey_range_lower = str_sub(storey_range, 0, 2)) %>%
                  mutate(hdbresale_filtered, storey_range_upper = str_sub(storey_range, -3, -1)) %>%
                  mutate(hdbresale_filtered, remaining_lease_year = str_sub(remaining_lease, 0, 2)) %>%
                  select(month, town, address, block, street_name, flat_type, storey_range, storey_range_lower, storey_range_upper, floor_area_sqm, flat_model, lease_commence_date, remaining_lease, remaining_lease_year, resale_price)

hdbresale_new

After filtering, We can see that from Jan 2019 to September 2020, there are 15901 transactions for four-room flat in Singapore.

4.1.2 Get lat long with address

Next, we get the get long with the address provided.

The follow code chunk:

Show code
for (i in 1:nrow(hdbresale_new)) {
  address <- hdbresale_new[i,'address']
  
  url = paste('https://developers.onemap.sg/commonapi/search?searchVal=', address, '&returnGeom=Y&getAddrDetails=Y&pageNum=1')
  encoded_url <- URLencode(url)
  
  getcoordinate <- GET(encoded_url)
  
  jsonParsed <- content(getcoordinate,as="parsed")
  
  if (length(jsonParsed$results) > 0) {
    hdbresale_new[i,'LATITUDE'] = jsonParsed$results[[1]]$LATITUDE
    hdbresale_new[i,'LONGITUDE'] = jsonParsed$results[[1]]$LONGITUDE
  }
}

4.1.3 Check if any rows with missing lat long

The following code chunk use is.na function of base package and sum to calculate the total number of NA is there is any.

Show code
sum(is.na(hdbresale_new$LATITUDE))

There are 26 rows with missing lat long. We will take a look at which areas are the affected rows

4.1.4 Extract rows with missing lat long

The following code chunk filters the rows with missing latlong

Show code
hdbresaleNA <- hdbresale_new %>%
                 filter(is.na(hdbresale_new$LATITUDE))
hdbresaleNA

From the extracted missing rows, the common location with missing lat long is ST. GEORGE.

4.1.5 Replace lat long with result for missing lat long

We will next assign the lat long to these rows as long as the address is “ST. GEORGE” if there are any result

Similar to 4.1.2, this time we set the address as default “ST.GEORGE” and assign to each row as long as there is result

Show code
for (i in 1:nrow(hdbresaleNA)) {
  address = "ST. GEORGE"
  url = paste('https://developers.onemap.sg/commonapi/search?searchVal=', address, '&returnGeom=Y&getAddrDetails=Y&pageNum=1')
  encoded_url <- URLencode(url)
  
  getcoordinate <- GET(encoded_url)
  
  jsonParsed <- content(getcoordinate,as="parsed")
  
  if (length(jsonParsed$results) > 0) {
    hdbresaleNA[i,'LATITUDE'] = jsonParsed$results[[1]]$LATITUDE
    hdbresaleNA[i,'LONGITUDE'] = jsonParsed$results[[1]]$LONGITUDE
  }
}

4.1.6 Check again for missing lat long

The following code chunk check if there are still any missing value

Show code
hdbresaleNA

There are no missing value. However, the lat long for the 26 points are all the same.

4.1.7 Jitter

In order to avoid having overlapping points, the following code chunk use jitter function of base package so that the lat long are distinct. factor is set to random small number so that the jitter would not go very far away from the area.

Show code
hdbresaleNA$LATITUDE <- jitter(as.numeric(hdbresaleNA$LATITUDE), factor = 0.00123)
hdbresaleNA$LONGITUDE <- jitter(as.numeric(hdbresaleNA$LONGITUDE), factor = 0.00123)

4.1.7 Check again for missing lat long

Show code
hdbresaleNA

The lat long are now different.

4.1.8 Extract rows without missing lat long

The following code chunk filters the rows with latlong using drop_na function of tidyr to drop rows with missing LATITUDE values

Show code
hdbresaleNOTNA <- hdbresale_new %>%
                      drop_na(LATITUDE)
hdbresaleNOTNA

4.1.9 Combine rows with missing lat long and without missing lat long

The following code chunk combines hdbresaleNA and hdbresaleNOTNA using rbind function of base package.

Show code
hdbresale_latlong <- rbind(hdbresaleNA, hdbresaleNOTNA)
glimpse(hdbresale_latlong)

We have 15901 rows which is same as the number of rows of the initial dataset.

4.1.9 Visualise the data on interactive map

The following code chunk converts the dataset with latlong decimal degree to metres coordinates, and transform the crs to Singapore’s projection, 3414.

Show code
hdbresale_latlongm <- st_as_sf(hdbresale_latlong,
                    coords = c("LONGITUDE", 
                               "LATITUDE"),
                    crs=4326) %>%
  st_transform(crs = 3414)

The following code chunk plots the points to see if the data is correct using functions of tmap package.

Show code
tmap_mode("view")
tm_shape(hdbresale_latlongm)+
  tm_dots()
tmap_mode("plot")

4.1.10 Write as csv for backup

The following code chunk export data with sf as csv so that we do not have to re-run the time-consuming process above again.

Show code
write.csv(hdbresale_latlong, "data/newaspatial/hdbresale_latlongonly.csv", row.names = FALSE)

4.1.11 hdb_update

The following code chunk import the latest hdb data so that the subsequent calculated data can added.

Show code
hdb_updated <- read_csv("data/aspatial/hdbresale_latlongonly.csv")

4.2 Proximity to CBD

Since CBD is a huge area, we will first define the lat long for CBD, taken from Google. (Latitude : 1.2801, Longitude : 103.8509)

4.2.1 Get lat long

The following code chunk:

This process repeated a few times. This is because the onemapsgapi may return NA value for certain rows due to the request limit. Note that the first data was hdbresale_latlononly, and it was replaced by the data with missing distance, hdbresale_latlononly_CBD_NA_3, because of the repetition of this process.

Show code
for (i in 1:nrow(hdbresale_latlononly_CBD_NA3)) {
  startlat <- hdbresale_latlononly_CBD_NA3[i,'LATITUDE']
  startlong <- hdbresale_latlononly_CBD_NA3[i,'LONGITUDE']
  
  start = gsub(' ', '', paste(startlat, ',',  startlong))
  
  end = '1.2801,103.8509'
  routeType = 'drive'
  
  url = gsub(' ', '', paste('https://developers.onemap.sg/privateapi/routingsvc/route?start=', start, '&end=', end, '&routeType=', routeType, '&token=', token))
  
  getproximitytocbd <- GET(url)
  
  jsonParsed <- content(getproximitytocbd,as="parsed")
  
  if (length(jsonParsed$route_summary) > 0) {
    hdbresale_latlononly_CBD_NA3[i,'PROX_CBD'] = jsonParsed$route_summary$total_distance/1000
  }
}

4.2.2 Write as csv to backup

The following code chunk write a copy of the csv for backup purposes.

Show code
write.csv(hdbresale_latlononly_CBD_NA3, "data/newaspatial/hdbresale_PROXCBD_fifthrun.csv", row.names = FALSE)

4.2.3 Extract rows with missing distance

The following code chunk filter the rows with distance NA, and repeat the above code chunk to obtain distance for remaining rows with NA value.

Show code
hdbresale_latlononly_CBD_NA4 <- hdbresale_latlononly_CBD_NA3 %>%
                              filter(is.na(hdbresale_latlononly_CBD_NA3$PROX_CBD))
hdbresale_latlononly_CBD_NA4

After a few iteration, hdbresale_latlononly_CBD_NA4 no long have any data which means all rows have PROX_CBD distance.

4.2.3 Extract rows without missing distance and write as csv to back up

The following code chunk filter the rows with valid distance value and write a copy of the csv in the file path

Show code
hdbresale_latlononly_CBD_NONA5 <- hdbresale_latlononly_CBD_NA3 %>%
                                  drop_na(PROX_CBD)
hdbresale_latlononly_CBD_NONA5

write.csv(hdbresale_latlononly_CBD_NONA5, "data/newaspatial/hdbresale_proxcbd5.csv", row.names = FALSE)

4.2.4 Combine PROX_CBD data with valid distance

The following code chunk rbind all the dataset after getting the PROX_CBD of every row from the repetitive steps of 4.2.1 - 4.2.3. The final data with PROX_CBD is written as csv for backup

Data to bind includes:

Show code
PROX_CBD <- rbind(hdbresale_latlononly_CBD_NONA1, hdbresale_latlononly_CBD_NONA2, hdbresale_latlononly_CBD_NONA3, hdbresale_latlononly_CBD_NONA4, hdbresale_latlononly_CBD_NONA5)

PROX_CBD

write.csv(PROX_CBD, "data/newaspatial/hdbresale_proxcbdfinal.csv", row.names = FALSE)

4.2.5 Combine data with hdb_updated

The following code chunk combines PROX_CBD with the updated data, hdb_updated using cbind function of base package which combines column.

Show code
hdb_updated <- cbind(hdb_updated, PROX_CBD)

4.3 Proximity to Eldercare

4.3.1 Import Eldercare data

The following code chunk import eldercare shapefile data using st_read.

Show code
eldercare_sf <- st_read(dsn = "data/independentvar/extracted", layer="ELDERCARE")

There are 133 rows of eldercare. The projected CRS is already in SVY21.

4.3.2 Calculate distances

The following code chunk:

Show code
hdb_updated_geometry <- st_as_sf(hdb_updated,
                          coords = c("LONGITUDE", 
                                     "LATITUDE"),
                          crs=4326) %>%
                          st_transform(crs = 3414)

hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)

elderlycare_df <- eldercare_sf$geometry
elderlycare <- st_coordinates(elderlycare_df)

dist_hdb_elderlycare <- spDists(hdb, elderlycare, longlat=FALSE)

4.3.3 Write as csv to backup

Show code
write.csv(dist_hdb_elderlycare, "data/newaspatial/hdbtoeldercare.csv", row.names = FALSE)

4.3.4 Get min distances for each row

The following code chunk:

Show code
dist_hdb_elderlycare_df <- data.frame(dist_hdb_elderlycare)

dist_hdb_elderlycare_min <- dist_hdb_elderlycare_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X133)))

PROX_ELDERLYCARE <- dist_hdb_elderlycare_min$Min/1000

4.3.5 Combine data with hdb_updated

The following code chunk combines PROX_ELDERLYCARE with the updated data, hdb_updated, via cbind function of base package.

Show code
hdb_updated <- cbind(hdb_updated, PROX_ELDERLYCARE)

4.3.6 Write as csv to backup

The following code chunk writes the updated data for backup purposes.

Show code
write.csv(hdb_updated, "data/newaspatial/hdbupdated1.csv", row.names = FALSE)

4.4 Proximity to Hawker Centre

4.4.1 Import Hawker data

The following code chunk import and assign the projection crs of Singapore 3414.

Show code
hawkercentre_sf <- st_read("data/independentvar/extracted/hawker-centres-geojson.geojson") %>%
                    st_transform(crs = 3414)

There are 125 rows of Hawker Centre.

4.4.2 Calculate distances

The following code chunk is similar to 4.3.2. The only difference is that we only use the hawkercentre’s first and second column (X and Y coordinate) since the st_coordinates also include the Z coodinate column in the data.

Show code
hdb_updated_geometry <- st_as_sf(hdb_updated,
                          coords = c("LONGITUDE", 
                                     "LATITUDE"),
                          crs=4326) %>%
                          st_transform(crs = 3414)

hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)

hawkercentre_df <- hawkercentre_sf$geometry
hawkercentre <- st_coordinates(hawkercentre_df)
hawkercentre <- hawkercentre[,c(1,2)]

dist_hdb_hawkercentre <- spDists(hdb, hawkercentre, longlat=FALSE)

4.4.3 Write as csv to backup

The following code chunk is similar to 4.3.3.

Show code
write.csv(dist_hdb_hawkercentre, "data/newaspatial/hdbtohawker.csv", row.names = FALSE)

4.4.4 Get min distances for each row

The following code chunk is similar to 4.3.4.

Show code
dist_hdb_hawkercentre_df <- data.frame(dist_hdb_hawkercentre)

dist_hdb_hawkercentre_min <- dist_hdb_hawkercentre_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X125)))

PROX_HAWKER <- dist_hdb_hawkercentre_min$Min/1000

4.4.5 Combine data with hdb_updated

The following code chunk is similar to 4.3.5

Show code
hdb_updated <- cbind(hdb_updated, PROX_HAWKER)

4.4.6 Write as csv to backup

Show code
write.csv(hdb_updated, "data/newaspatial/hdbupdated2.csv", row.names = FALSE)

4.5 Proximity to MRT

4.5.1 Import MRTLRT data

The following code chunk import the MRT shapefile using st_read.

Show code
mrtlrt_sf <- st_read(dsn = "data/independentvar/extracted", layer="MRTLRTStnPtt")

There are 185 rows of MRTLRT data. The project CRS is already SVY21.

4.5.2 Calculate distances

The following code chunk is similar to 4.3.2.

Show code
hdb_updated_geometry <- st_as_sf(hdb_updated,
                          coords = c("LONGITUDE", 
                                     "LATITUDE"),
                          crs=4326) %>%
                          st_transform(crs = 3414)

hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)

mrtlrt_df <- mrtlrt_sf$geometry
mrtlrt <- st_coordinates(mrtlrt_df)

dist_hdb_mrtlrt <- spDists(hdb, mrtlrt, longlat=FALSE)

4.5.3 Write as csv to backup

The following code chunk is similar to 4.3.3

Show code
write.csv(dist_hdb_mrtlrt, "data/newaspatial/hdbtomrtlrt.csv", row.names = FALSE)

4.5.4 Get min distances for each row

The following code chunk is similar to 4.3.4

Show code
dist_hdb_mrtlrt_df <- data.frame(dist_hdb_mrtlrt)

dist_hdb_mrtlrt_min <- dist_hdb_mrtlrt_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X185)))

PROX_MRTLRT <- dist_hdb_mrtlrt_min$Min/1000

4.5.5 Combine data with hdb_updated

The following code chunk is similar to 4.3.5

Show code
hdb_updated <- cbind(hdb_updated, PROX_MRTLRT)

4.5.6 Write as csv to backup

The following code chunk is similar to 4.3.6

Show code
write.csv(hdb_updated, "data/newaspatial/hdbupdated3.csv", row.names = FALSE)

4.6 Proximity to Parks

4.6.1 Import Parks data

The following code chunk import and assign the projection crs of Singapore 3414.

Show code
parks_sf <- st_read("data/independentvar/extracted/parks-geojson.geojson") %>%
                    st_transform(crs = 3414)

4.6.2 Calculate distances

The following code chunk is similar to 4.4.2

Show code
hdb_updated_geometry <- st_as_sf(hdb_updated,
                          coords = c("LONGITUDE", 
                                     "LATITUDE"),
                          crs=4326) %>%
                          st_transform(crs = 3414)

hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)

parks_df <- parks_sf$geometry
parks <- st_coordinates(parks_df)
parks <- parks[,c(1,2)]

dist_hdb_parks <- spDists(hdb, parks, longlat=FALSE)

4.6.3 Write as csv to backup

The following code chunk is similar to 4.4.3

Show code
write.csv(dist_hdb_parks, "data/newaspatial/hdbtoparks.csv", row.names = FALSE)

4.6.4 Get min distances for each row

The following code chunk is similar to 4.4.4

Show code
dist_hdb_parks_df <- data.frame(dist_hdb_parks)

dist_hdb_parks_min <- dist_hdb_parks_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X350)))

PROX_PARKS <- dist_hdb_parks_min$Min/1000

4.6.5 Combine data with hdb_updated

The following code chunk is similar to 4.4.5

Show code
hdb_updated <- cbind(hdb_updated, PROX_PARKS)

4.6.6 Write as csv to backup

The following code chunk is similar to 4.4.6

Show code
write.csv(hdb_updated, "data/newaspatial/hdbupdated4.csv", row.names = FALSE)

4.7 Proximity to Supermarket

4.7.1 Import Supermarket data

The following code chunk import data and assign the projection crs of Singapore 3414.

Show code
supermarkets_sf <- st_read("data/independentvar/extracted/supermarkets-geojson.geojson") %>%
                    st_transform(crs = 3414)

There are 526 rows of supermarket.

4.7.2 Calculate distances

The following code chunk is similar to 4.4.2

Show code
hdb_updated_geometry <- st_as_sf(hdb_updated,
                          coords = c("LONGITUDE", 
                                     "LATITUDE"),
                          crs=4326) %>%
                          st_transform(crs = 3414)

hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)

supermarkets_df <- supermarkets_sf$geometry
supermarkets <- st_coordinates(supermarkets_df)
supermarkets <- supermarkets[,c(1,2)]

dist_hdb_supermarkets <- spDists(hdb, supermarkets, longlat=FALSE)

4.7.3 Write as csv to backup

The following code chunk is similar to 4.4.3

Show code
write.csv(dist_hdb_supermarkets, "data/newaspatial/hdbtosupermarkets.csv", row.names = FALSE)

4.7.4 Get min distances for each row

The following code chunk is similar to 4.4.4

Show code
dist_hdb_supermarkets_df <- data.frame(dist_hdb_supermarkets)

dist_hdb_supermarkets_min <- dist_hdb_supermarkets_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X526)))

PROX_SUPERMARKETS <- dist_hdb_supermarkets_min$Min/1000

4.7.5 Combine data with hdb_updated

The following code chunk is similar to 4.4.5

Show code
hdb_updated <- cbind(hdb_updated, PROX_SUPERMARKETS)

4.7.6 Write as csv to backup

The following code chunk is similar to 4.4.6

Show code
write.csv(hdb_updated, "data/newaspatial/hdbupdated5.csv", row.names = FALSE)

4.8 Proximity to CHAS clinics

4.8.1 Import CHAS clinics data

The following code chunk imports data and assigned projection crs of Singapore 3414.

Show code
clinics_sf <- st_read("data/independentvar/extracted/chas-clinics-geojson.geojson") %>%
                    st_transform(crs = 3414)

There are 1167 rows of CHAS clinics.

4.8.2 Calculate distances

The following code chunk is similar to 4.4.2

Show code
hdb_updated_geometry <- st_as_sf(hdb_updated,
                          coords = c("LONGITUDE", 
                                     "LATITUDE"),
                          crs=4326) %>%
                          st_transform(crs = 3414)

hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)

clinics_df <- clinics_sf$geometry
clinics <- st_coordinates(clinics_df)
clinics <- clinics[,c(1,2)]

dist_hdb_clinics <- spDists(hdb, clinics, longlat=FALSE)

4.8.3 Write as csv to backup

The following code chunk is similar to 4.4.3

Show code
write.csv(dist_hdb_clinics, "data/newaspatial/hdbtoclinics.csv", row.names = FALSE)

4.8.4 Get min distances for each row

The following code chunk is similar to 4.4.4

Show code
dist_hdb_clinics_df <- data.frame(dist_hdb_clinics)

dist_hdb_clinics_min <- dist_hdb_clinics_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X1167)))

PROX_CLINICS <- dist_hdb_clinics_min$Min/1000

4.8.5 Combine data with hdb_updated

The following code chunk is similar to 4.4.5

Show code
hdb_updated <- cbind(hdb_updated, PROX_CLINICS)

4.8.6 Write as csv to backup

The following code chunk is similar to 4.4.6

Show code
write.csv(hdb_updated, "data/newaspatial/hdbupdated6.csv", row.names = FALSE)

4.9 Proximity to Kindergartens and Number of Kindergartens within 350m

4.9.1 Import kindergartens data

The following code chunks import data and assigned projected crs of 3414.

Show code
kindergartens_sf <- st_read("data/independentvar/extracted/kindergartens-geojson.geojson") %>%
                    st_transform(crs = 3414)

There are 448 rows of kindergartens.

4.9.2 Calculate distances

The following code chunk is similar to 4.4.2

Show code
hdb_updated_geometry <- st_as_sf(hdb_updated,
                          coords = c("LONGITUDE", 
                                     "LATITUDE"),
                          crs=4326) %>%
                          st_transform(crs = 3414)

hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)

kindergartens_df <- kindergartens_sf$geometry
kindergartens <- st_coordinates(kindergartens_df)
kindergartens <- kindergartens[,c(1,2)]

dist_hdb_kindergartens <- spDists(hdb, kindergartens, longlat=FALSE)

4.9.3 Write as csv to backup

The following code chunk is similar to 4.4.3

Show code
write.csv(dist_hdb_kindergartens, "data/newaspatial/hdbtokindergartens.csv", row.names = FALSE)

4.9.4 Get min distances for each row

The following code chunk is similar to 4.4.4

Show code
dist_hdb_kindergartens_df <- data.frame(dist_hdb_kindergartens)

dist_hdb_kindergartens_min <- dist_hdb_kindergartens_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X448)))

PROX_KINDERGARTENS <- dist_hdb_kindergartens_min$Min/1000

4.9.5 Count the number of point below 350m in each row

The following code chunk calculates the number of points with distance lower than 350m, which means number of points within 350m, using rowSum function of base package.

Show code
dist_hdb_kindergartens_df$num_within_350m <- rowSums(dist_hdb_kindergartens_df <= 350)

NUM_KINDERGARDEN_350M <- dist_hdb_kindergartens_df$num_within_350m

4.9.6 Combine data with hdb_updated

The following code chunk combines PROX_KINDERGARTENS and NUM_KINDERGARDEN_350M with the most updated data, hdb_updated, via cbind function of base package.

Show code
hdb_updated <- cbind(hdb_updated, PROX_KINDERGARTENS, NUM_KINDERGARDEN_350M)

4.9.7 Write as csv to backup

The following code chunk is similar to 4.4.6

Show code
write.csv(hdb_updated, "data/newaspatial/hdbupdated7.csv", row.names = FALSE)

4.10 Proximity to childcare centres and Numbers of childcare centres within 350m

4.10.1 Import childcare centres data

The following code chunks import data and assigned projected crs of 3414.

Show code
childcare_sf <- st_read("data/independentvar/extracted/child-care-services-geojson.geojson") %>%
                    st_transform(crs = 3414)

There are 1545 rows of childcare centres.

4.10.2 Calculate distances

The following code chunk is similar to 4.4.2

Show code
hdb_updated_geometry <- st_as_sf(hdb_updated,
                          coords = c("LONGITUDE", 
                                     "LATITUDE"),
                          crs=4326) %>%
                          st_transform(crs = 3414)

hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)

childcare_df <- childcare_sf$geometry
childcare <- st_coordinates(childcare_df)
childcare <- childcare[,c(1,2)]

dist_hdb_childcare <- spDists(hdb, childcare, longlat=FALSE)

4.10.3 Write as csv to backup

The following code chunk is similar to 4.4.3

Show code
write.csv(dist_hdb_childcare, "data/newaspatial/hdbtochildcare.csv", row.names = FALSE)

4.10.4 Get min distances for each row

The following code chunk is similar to 4.4.4

Show code
dist_hdb_childcare_df <- data.frame(dist_hdb_childcare)

dist_hdb_childcare_min <- dist_hdb_childcare_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X1545)))

PROX_CHILDCARE <- dist_hdb_childcare_min$Min/1000

4.10.5 Count the number of point below 350m in each row

The following code chunk is similar to 4.9.5

Show code
dist_hdb_childcare_df$num_within_350m <- rowSums(dist_hdb_childcare_df <= 350)

NUM_CHILDCARE_350M <- dist_hdb_childcare_df$num_within_350m

4.10.6 Combine data with hdb_updated

The following code chunk is similar to 4.9.6

Show code
hdb_updated <- cbind(hdb_updated, PROX_CHILDCARE, NUM_CHILDCARE_350M)

4.10.7 Write as csv to backup

The following code chunk is similar to 4.4.6

Show code
write.csv(hdb_updated, "data/newaspatial/hdbupdated8.csv", row.names = FALSE)

4.11 Proximity to bus stop and Numbers of bus stop within 350m

4.11.1 Import bus stops data

The following code chunk imports busstop shapefile.

Show code
busstop_sf <- st_read(dsn = "data/independentvar/extracted", layer="BusStop")

There are 5156 rows of eldercare. The projected CRS is already in SVY21.

4.11.2 Calculate distance

The following code chunk is similar to 4.4.2

Show code
hdb_updated_geometry <- st_as_sf(hdb_updated,
                          coords = c("LONGITUDE", 
                                     "LATITUDE"),
                          crs=4326) %>%
                          st_transform(crs = 3414)

hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)

busstop_df <- busstop_sf$geometry
busstop <- st_coordinates(busstop_df)
busstop <- busstop[,c(1,2)]

dist_hdb_busstop <- spDists(hdb, busstop, longlat=FALSE)

4.11.3 Write as csv to backup

The following code chunk is similar to 4.4.3

Show code
write.csv(dist_hdb_busstop, "data/newaspatial/hdbtobusstop.csv", row.names = FALSE)

4.11.4 Get min distances for each row

The following code chunk is similar to 4.4.4

Show code
dist_hdb_busstop_df <- data.frame(dist_hdb_busstop)

dist_hdb_busstop_min <- dist_hdb_busstop_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X5156)))

PROX_BUSSTOP <- dist_hdb_busstop_min$Min/1000

4.11.5 Count the number of point below 350m in each row

The following code chunk is similar to 4.9.5

Show code
dist_hdb_busstop_df$num_within_350m <- rowSums(dist_hdb_busstop_df <= 350)
NUM_BUSSTOP_350M <- dist_hdb_busstop_df$num_within_350m

4.11.6 Combine data with hdb_updated

The following code chunk is similar to 4.9.6

Show code
hdb_updated <- cbind(hdb_updated, PROX_BUSSTOP, NUM_BUSSTOP_350M)

4.11.7 Write as csv to backup

The following code chunk is similar to 4.4.6

Show code
write.csv(hdb_updated, "data/newaspatial/hdbupdated9.csv", row.names = FALSE)

4.12 Numbers of primary school within 1km

4.12.1 Import school data

The following code chunk reads the csv file of school using read_csv function of readr package.

Show code
schools_df <- read_csv("data/independentvar/extracted/Singapore+Schools-2021-10-21.csv")

There are 367 rows of schools data.

4.12.2 Filter Primary school

The following code chunk filters rows that contains PRIMRY SCHOOL in the school name using grepl function of base

Show code
primary_schools_df <- schools_df %>% filter(grepl('PRIMARY SCHOOL', Name))

4.12.3 Get Lat Long for Primary School with postalcode

The following code chunk is similar to 4.1.2

Show code
for (i in 1:nrow(primary_schools_df)) {
  postalcode <- primary_schools_df[i,'Zip/Postal Code']
  
  url = paste('https://developers.onemap.sg/commonapi/search?searchVal=', postalcode, '&returnGeom=Y&getAddrDetails=Y&pageNum=1')
  encoded_url <- URLencode(url)
  
  getcoordinate <- GET(encoded_url)
  
  jsonParsed <- content(getcoordinate,as="parsed")
  
  if (length(jsonParsed$results) > 0) {
    primary_schools_df[i,'LATITUDE'] = jsonParsed$results[[1]]$LATITUDE
    primary_schools_df[i,'LONGITUDE'] = jsonParsed$results[[1]]$LONGITUDE
  }
}

4.12.4 Check NA

Show code
sum(is.na(primary_schools_df$LATITUDE))

There are three rows of schools with missing NA.

4.12.5 Assign latlong manually

The following code chunk assign lat long manually (searched from google) to the rows with missing LATITUDE and LONGITUDE

Show code
primary_schools_df[8,'LATITUDE'] = '1.2753515360141332'
primary_schools_df[8,'LONGITUDE'] = '103.83994593723308'

primary_schools_df[37,'LATITUDE'] = '1.2751443823249178'
primary_schools_df[37,'LONGITUDE'] = '103.82391208407438'

primary_schools_df[100,'LATITUDE'] = '1.3252154625658445'
primary_schools_df[100,'LONGITUDE'] = '103.88172874174562'

4.12.6 Select only necessary column

The following code chunk only selects necessary column.

Show code
primary_schools_df_selected <- primary_schools_df %>% select('Name', 'Address', 'Zip/Postal Code', 'LATITUDE', 'LONGITUDE')

There are 156 rows of primary school after filtering.

4.12.7 Write as csv to backup

Show code
write.csv(primary_schools_df_selected, "data/newaspatial/primaryschoollatlong.csv", row.names = FALSE)

4.12.8 Convert Latlong to geometry

The following code chunk converts the dataset with latlong decimal degree to metres coordinates, and transform the crs to Singapore’s projection, 3414.

Show code
primary_schools_sf <- st_as_sf(primary_schools_df_selected,
                    coords = c("LONGITUDE", 
                               "LATITUDE"),
                    crs=4326) %>%
  st_transform(crs = 3414)

4.12.9 Calculate distance

The following code chunk is similar to 4.4.2

Show code
hdb_updated_geometry <- st_as_sf(hdb_updated,
                          coords = c("LONGITUDE", 
                                     "LATITUDE"),
                          crs=4326) %>%
                          st_transform(crs = 3414)

hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)

prisch_df <- primary_schools_sf$geometry
prisch <- st_coordinates(prisch_df)
prisch <- prisch[,c(1,2)]

dist_hdb_prisch <- spDists(hdb, prisch, longlat=FALSE)

4.12.8 Write as csv to backup

Show code
write.csv(dist_hdb_prisch, "data/newaspatial/hdbtoprisch.csv", row.names = FALSE)

4.12.9 Get min distances for each row

The following code chunk is similar to 4.4.4

Show code
dist_hdb_prisch_df <- data.frame(dist_hdb_prisch)

dist_hdb_prisch_min <- dist_hdb_prisch_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X156)))

PROX_PRISCH <- dist_hdb_prisch_min$Min/1000

4.12.10 Count the number of point below 1km in each row

The following code chunk is similar to 4.9.5. The only difference is that we are counting the number of point below 1km.

Show code
dist_hdb_prisch_df$num_within_1km <- rowSums(dist_hdb_prisch_df <= 1000)

NUM_PRISCH_1KM <- dist_hdb_prisch_df$num_within_1km

4.12.11 Combine data with hdb_updated

The following code chunk is similar to 4.9.6

Show code
hdb_updated <- cbind(hdb_updated, PROX_PRISCH, NUM_PRISCH_1KM)

4.12.12 Write as csv to backup

Show code
write.csv(hdb_updated, "data/newaspatial/hdbupdated10.csv", row.names = FALSE)

4.13 Storey Level Dummy Variable

The following code chunk creates dummy variable for storey_range column using dummy function of dummies package.

Show code
hdb_updated1 <- cbind(hdb_updated, dummy(hdb_updated$storey_range, sep = "_")) 

The following code chunk replace the dummy’s column name from “hdb_updated_01 TO 03” format to “storey_01TO03” format using gsub function of base package.

Show code
names(hdb_updated1) = gsub("hdb_updated_", "storey_", names(hdb_updated1))
names(hdb_updated1) = gsub(" ", "", names(hdb_updated1))

The following code chunk then select all the require column in hdb_final using select function

Show code
hdb_final <- hdb_updated1 %>%
            select(address, storey_range, LATITUDE, LONGITUDE, resale_price, floor_area_sqm, remaining_lease_year, PROX_CBD, PROX_ELDERLYCARE, PROX_HAWKER, PROX_MRTLRT, PROX_PARKS, PROX_SUPERMARKETS, PROX_CLINICS, NUM_KINDERGARDEN_350M, NUM_CHILDCARE_350M, NUM_BUSSTOP_350M, NUM_PRISCH_1KM, `storey_01TO03`, `storey_04TO06`, `storey_07TO09`, `storey_10TO12`, `storey_13TO15`, `storey_16TO18`, `storey_19TO21`, `storey_22TO24`, `storey_25TO27`, `storey_28TO30`, `storey_31TO33`, `storey_34TO36`, `storey_37TO39`, `storey_40TO42`, `storey_43TO45`, `storey_46TO48`, `storey_49TO51`)

The following code chunk save hdb_final as csv using write.csv function

Show code
write.csv(hdb_final, "data/aspatial/hdb_final.csv", row.names = FALSE)

5. Data Import

5.1 Geospatial Data

The following code chunk import MP14_SUBZONE_WEB_PL using st_read function of sf package.

Show code
mpsz = st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\kwekyichen\IS415_blog\_posts\2021-10-27-takehome-ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

The following code chunk update the EPSG code 3414 using st_transform of sf package and retrieve coordinate using st_crs of sf package

Show code
mpsz_svy21 <- st_transform(mpsz, 3414)
st_crs(mpsz_svy21)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

We can see that the EPSG is 3414 which is correct.

5.2 Aspatial data

5.2.1 hdb_final data

The following code chunk read hdb_final csv file as tibble data frame using read_csv function and glimpse to display the data

Show code
hdb_final <- read_csv("data/aspatial/hdb_final.csv")
glimpse(hdb_final)
Rows: 15,901
Columns: 35
$ address               <chr> "8 ST. GEORGE'S LANE", "1 ST. GEORGE'S~
$ storey_range          <chr> "01 TO 03", "04 TO 06", "01 TO 03", "0~
$ LATITUDE              <dbl> 1.322127, 1.322143, 1.322102, 1.322123~
$ LONGITUDE             <dbl> 103.8598, 103.8605, 103.8621, 103.8622~
$ resale_price          <dbl> 435000, 370000, 440000, 470000, 488000~
$ floor_area_sqm        <dbl> 93, 82, 91, 93, 104, 93, 93, 104, 104,~
$ remaining_lease_year  <dbl> 62, 56, 64, 64, 64, 64, 60, 64, 76, 60~
$ PROX_CBD              <dbl> 7.047, 7.053, 6.795, 6.795, 5.879, 7.0~
$ PROX_ELDERLYCARE      <dbl> 0.4742022, 0.4081859, 0.2889761, 0.287~
$ PROX_HAWKER           <dbl> 0.4852265, 0.4262707, 0.3345345, 0.334~
$ PROX_MRTLRT           <dbl> 0.3704969, 0.3300307, 0.3029719, 0.306~
$ PROX_PARKS            <dbl> 0.7013871, 0.6186203, 0.4548956, 0.446~
$ PROX_SUPERMARKETS     <dbl> 0.06992773, 0.13996918, 0.30870224, 0.~
$ PROX_CLINICS          <dbl> 0.15861481, 0.08108943, 0.12433439, 0.~
$ NUM_KINDERGARDEN_350M <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1,~
$ NUM_CHILDCARE_350M    <dbl> 4, 4, 5, 5, 4, 4, 4, 4, 4, 4, 6, 5, 4,~
$ NUM_BUSSTOP_350M      <dbl> 4, 5, 5, 5, 3, 5, 5, 5, 3, 5, 1, 5, 5,~
$ NUM_PRISCH_1KM        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
$ storey_01TO03         <dbl> 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,~
$ storey_04TO06         <dbl> 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1,~
$ storey_07TO09         <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0,~
$ storey_10TO12         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_13TO15         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_16TO18         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_19TO21         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_22TO24         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_25TO27         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_28TO30         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_31TO33         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_34TO36         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_37TO39         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_40TO42         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_43TO45         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_46TO48         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_49TO51         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~

There are 15901 rows of data and 35 columns as shown above.

5.2.2 Check if any NA

The following code chunk Checks if there are any missing values in the data using sum and is.na function of base package

Show code
sum(is.na(hdb_final))
[1] 0

There are no missing values in the dataset.

5.3 Convert aspatial data frame to simple feature data frame

The following code chunk converts hdb_final dataframe into sf dataframe using st_as_sf function of sf package. The coordinate is then converted into svy21 EPSG:3414 FROM WGS84 using st_transform function.

Show code
hdb_final_sf <- st_as_sf(hdb_final,
                            coords = c("LONGITUDE", "LATITUDE"),
                            crs=4326) %>%
                  st_transform(crs=3414)
head(hdb_final_sf)
Simple feature collection with 6 features and 33 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 30853.96 ymin: 33816.31 xmax: 31212.98 ymax: 33821.39
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 x 34
  address             storey_range resale_price floor_area_sqm remaining_lease~
  <chr>               <chr>               <dbl>          <dbl>            <dbl>
1 8 ST. GEORGE'S LANE 01 TO 03           435000             93               62
2 1 ST. GEORGE'S RD   04 TO 06           370000             82               56
3 23 ST. GEORGE'S RD  01 TO 03           440000             91               64
4 14 ST. GEORGE'S RD  07 TO 09           470000             93               64
5 10 ST. GEORGE'S RD  01 TO 03           488000            104               64
6 20 ST. GEORGE'S RD  01 TO 03           385000             93               64
# ... with 29 more variables: PROX_CBD <dbl>, PROX_ELDERLYCARE <dbl>,
#   PROX_HAWKER <dbl>, PROX_MRTLRT <dbl>, PROX_PARKS <dbl>,
#   PROX_SUPERMARKETS <dbl>, PROX_CLINICS <dbl>,
#   NUM_KINDERGARDEN_350M <dbl>, NUM_CHILDCARE_350M <dbl>,
#   NUM_BUSSTOP_350M <dbl>, NUM_PRISCH_1KM <dbl>,
#   storey_01TO03 <dbl>, storey_04TO06 <dbl>, storey_07TO09 <dbl>,
#   storey_10TO12 <dbl>, storey_13TO15 <dbl>, ...

6. EDD

6.1 Statistical Graph

6.1.1 Resale Price

The following code chunk plots the distribution of resale_price using ggplot function of ggplot2 package.

Show code
ggplot(data=hdb_final_sf, aes(x= `resale_price`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

From the above, we can see a normal distribution. Majority of the four room hdb were transacted for about $300000 to 400000.

6.1.2 Other variables

Next, we are interested to see the distribution of other variables.

The following code chunk plots multiple distribution of other variables using ggplot function of ggplot2 package and arrange them in columns of 3 using ggarrange function of ggpubr package. Note that NUM_KINDERGARTENS_350M, NUM_CHILDCARE_350M, NUM_BUSSTOPS_350M and NUM_PRISCH_1KM are not plotted because the data is in count.

Show code
floor_area_sqm <- ggplot(data=hdb_final_sf, aes(x= `floor_area_sqm`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

remaining_lease_year <- ggplot(data=hdb_final_sf, aes(x= `remaining_lease_year`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_CBD <- ggplot(data=hdb_final_sf, aes(x= `PROX_CBD`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_ELDERLYCARE <- ggplot(data=hdb_final_sf, aes(x= `PROX_ELDERLYCARE`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_HAWKER <- ggplot(data=hdb_final_sf, aes(x= `PROX_HAWKER`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_MRTLRT <- ggplot(data=hdb_final_sf, aes(x= `PROX_MRTLRT`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_PARKS <- ggplot(data=hdb_final_sf, aes(x= `PROX_PARKS`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_SUPERMARKETS <- ggplot(data=hdb_final_sf, aes(x= `PROX_SUPERMARKETS`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_CLINICS <- ggplot(data=hdb_final_sf, aes(x= `PROX_CLINICS`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

ggarrange(floor_area_sqm, remaining_lease_year, PROX_CBD, PROX_ELDERLYCARE, PROX_HAWKER, PROX_MRTLRT, PROX_PARKS, PROX_SUPERMARKETS, PROX_CLINICS, ncol = 3, nrow = 3)

From the above, we can see that most of variables are normally distributed. The remaining lease year seems to be normally distributed but the only exception is that it has a high count of transacted four-room hdb with remaining_lease_year of above 90 years. There is no need for any transformation since the variables are normally distributed.

6.2 Statiscal Point Map

Resale Price

The following code chunk:

Show code
tmap_mode("view")
tm_shape(mpsz_svy21)+
  tm_polygons()+
  tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_sf) +  
  tm_dots(col = "resale_price",
          alpha = 0.6,
          style="quantile") +
  tm_view(set.zoom.limits = c(10,15))
Show code
tmap_mode("plot")

From the interactive map, we can see that hdb in central and south area tends to have higher resale price (darker red points) while hdb in the west have lower resale price as seen that there are more light yellow dots in the west.

7. Hedonic Pricing Model

7.1 Multiple Linear Regression Model

7.1.1 Relationship between Independent variables

To avoid multicolinearity, we need to ensure that the independent variables are not highly correlated before we build a multiple regression model.

The following code chunk:

Show code
corrplot(cor(hdb_final[, 6:35]), diag = FALSE, order = "AOE",
         tl.pos = "td", tl.cex = 0.7, number.cex=0.5, method = "number", type = "upper")

From the scatterplot above, we can see that the independent variable are not highly correlated to each other. Hence, we can include all of them in the subsequent model building.

7.1.2 Hedonic pricing model using multiple linear regression method

The following code chunk builds a multiple linear regression model using lm function of stats package.

Note that storey49TO51 dummy variable is excluded since because we should only have n-1 dummy variable. It will have a strong correlation with other independent variable if we were to include it.

Show code
hdb_final_mlr <- lm(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER + PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + PROX_CLINICS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M + NUM_PRISCH_1KM + storey_01TO03 + storey_04TO06 + storey_07TO09 + storey_10TO12 + storey_13TO15 + storey_16TO18 + storey_19TO21 + storey_22TO24 + storey_25TO27 + storey_28TO30 + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42 + storey_43TO45 + storey_46TO48, data=hdb_final_sf)

summary(hdb_final_mlr)

Call:
lm(formula = resale_price ~ floor_area_sqm + remaining_lease_year + 
    PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER + PROX_MRTLRT + 
    PROX_PARKS + PROX_SUPERMARKETS + PROX_CLINICS + NUM_KINDERGARDEN_350M + 
    NUM_CHILDCARE_350M + NUM_BUSSTOP_350M + NUM_PRISCH_1KM + 
    storey_01TO03 + storey_04TO06 + storey_07TO09 + storey_10TO12 + 
    storey_13TO15 + storey_16TO18 + storey_19TO21 + storey_22TO24 + 
    storey_25TO27 + storey_28TO30 + storey_31TO33 + storey_34TO36 + 
    storey_37TO39 + storey_40TO42 + storey_43TO45 + storey_46TO48, 
    data = hdb_final_sf)

Residuals:
    Min      1Q  Median      3Q     Max 
-206792  -43837   -4773   38076  481414 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)            552176.55   45944.42  12.018  < 2e-16 ***
floor_area_sqm           2846.40      75.82  37.543  < 2e-16 ***
remaining_lease_year     3951.51      45.91  86.066  < 2e-16 ***
PROX_CBD               -12045.92     131.23 -91.795  < 2e-16 ***
PROX_ELDERLYCARE       -14597.19     824.84 -17.697  < 2e-16 ***
PROX_HAWKER            -20294.68    1091.64 -18.591  < 2e-16 ***
PROX_MRTLRT            -29591.48    1463.09 -20.225  < 2e-16 ***
PROX_PARKS              -5484.54    1238.29  -4.429 9.52e-06 ***
PROX_SUPERMARKETS      -33703.18    3709.69  -9.085  < 2e-16 ***
PROX_CLINICS            -4357.53    5398.76  -0.807 0.419601    
NUM_KINDERGARDEN_350M    6904.94     531.68  12.987  < 2e-16 ***
NUM_CHILDCARE_350M      -5115.76     289.59 -17.665  < 2e-16 ***
NUM_BUSSTOP_350M          783.85     183.74   4.266 2.00e-05 ***
NUM_PRISCH_1KM          -1441.23     435.29  -3.311 0.000932 ***
storey_01TO03         -445761.68   45118.86  -9.880  < 2e-16 ***
storey_04TO06         -426566.67   45113.33  -9.455  < 2e-16 ***
storey_07TO09         -412429.94   45113.33  -9.142  < 2e-16 ***
storey_10TO12         -404137.43   45114.48  -8.958  < 2e-16 ***
storey_13TO15         -402869.35   45122.64  -8.928  < 2e-16 ***
storey_16TO18         -395007.62   45144.40  -8.750  < 2e-16 ***
storey_19TO21         -363971.04   45234.37  -8.046 9.13e-16 ***
storey_22TO24         -364932.98   45297.31  -8.056 8.41e-16 ***
storey_25TO27         -323362.22   45438.00  -7.117 1.15e-12 ***
storey_28TO30         -249803.07   45582.74  -5.480 4.31e-08 ***
storey_31TO33         -234063.59   46310.40  -5.054 4.37e-07 ***
storey_34TO36         -239785.16   46553.05  -5.151 2.62e-07 ***
storey_37TO39         -173581.77   46456.27  -3.736 0.000187 ***
storey_40TO42         -135850.95   47072.91  -2.886 0.003907 ** 
storey_43TO45          -92521.17   58167.26  -1.591 0.111718    
storey_46TO48          -66834.46   58162.92  -1.149 0.250536    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 63710 on 15871 degrees of freedom
Multiple R-squared:  0.7192,    Adjusted R-squared:  0.7187 
F-statistic:  1401 on 29 and 15871 DF,  p-value: < 2.2e-16

From the above, we can see that there are some independent variables that are statistically not significant. Variables such as PROX_CLINICS, storey_43TO45 and storey_46TO48 have p-value above 0.05 and are not statistically significant. We will exclude these variable when we calibrate the revised model.

The following code chunk:

Show code
hdb_final_mlr1 <- lm(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER + PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M + NUM_PRISCH_1KM + storey_01TO03 + storey_04TO06 + storey_07TO09 + storey_10TO12 + storey_13TO15 + storey_16TO18 + storey_19TO21 + storey_22TO24 + storey_25TO27 + storey_28TO30 + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42, data=hdb_final_sf)

ols_regress(hdb_final_mlr1)
                            Model Summary                              
----------------------------------------------------------------------
R                       0.848       RMSE                    63714.701 
R-Squared               0.719       Coef. Var                  14.695 
Adj. R-Squared          0.719       MSE                4059563136.315 
Pred R-Squared          0.718       MAE                     49921.903 
----------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                     ANOVA                                       
--------------------------------------------------------------------------------
                    Sum of                                                      
                   Squares           DF       Mean Square       F          Sig. 
--------------------------------------------------------------------------------
Regression    1.649773e+14           26      6.345282e+12    1563.046    0.0000 
Residual      6.444151e+13        15874    4059563136.315                       
Total         2.294188e+14        15900                                         
--------------------------------------------------------------------------------

                                               Parameter Estimates                                                 
------------------------------------------------------------------------------------------------------------------
                model           Beta    Std. Error    Std. Beta       t        Sig           lower          upper 
------------------------------------------------------------------------------------------------------------------
          (Intercept)     492471.796     24238.793                  20.318    0.000     444961.012     539982.580 
       floor_area_sqm       2841.525        75.530        0.168     37.621    0.000       2693.478       2989.573 
 remaining_lease_year       3954.416        45.761        0.423     86.415    0.000       3864.719       4044.112 
             PROX_CBD     -12042.669       131.166       -0.571    -91.812    0.000     -12299.770     -11785.568 
     PROX_ELDERLYCARE     -14661.764       820.820       -0.081    -17.862    0.000     -16270.665     -13052.863 
          PROX_HAWKER     -20468.605      1071.005       -0.088    -19.112    0.000     -22567.896     -18369.314 
          PROX_MRTLRT     -29671.643      1459.634       -0.096    -20.328    0.000     -32532.692     -26810.594 
           PROX_PARKS      -5518.565      1237.964       -0.021     -4.458    0.000      -7945.114      -3092.015 
    PROX_SUPERMARKETS     -34816.859      3440.566       -0.046    -10.120    0.000     -41560.759     -28072.960 
NUM_KINDERGARDEN_350M       6906.444       531.636        0.058     12.991    0.000       5864.376       7948.512 
   NUM_CHILDCARE_350M      -5102.534       289.316       -0.084    -17.637    0.000      -5669.625      -4535.442 
     NUM_BUSSTOP_350M        789.801       183.597        0.019      4.302    0.000        429.929       1149.673 
       NUM_PRISCH_1KM      -1463.671       434.420       -0.019     -3.369    0.001      -2315.183       -612.159 
        storey_01TO03    -386171.260     22650.294       -1.221    -17.049    0.000    -430568.405    -341774.115 
        storey_04TO06    -366967.637     22639.285       -1.286    -16.209    0.000    -411343.205    -322592.069 
        storey_07TO09    -352833.427     22639.223       -1.191    -15.585    0.000    -397208.873    -308457.981 
        storey_10TO12    -344549.476     22641.483       -1.114    -15.218    0.000    -388929.352    -300169.601 
        storey_13TO15    -343277.945     22656.961       -0.853    -15.151    0.000    -387688.159    -298867.732 
        storey_16TO18    -335436.258     22700.335       -0.612    -14.777    0.000    -379931.490    -290941.027 
        storey_19TO21    -304452.078     22877.920       -0.344    -13.308    0.000    -349295.395    -259608.760 
        storey_22TO24    -305371.725     23002.292       -0.290    -13.276    0.000    -350458.826    -260284.624 
        storey_25TO27    -263781.805     23279.414       -0.193    -11.331    0.000    -309412.097    -218151.514 
        storey_28TO30    -190281.671     23557.597       -0.118     -8.077    0.000    -236457.234    -144106.107 
        storey_31TO33    -174526.499     24934.671       -0.069     -6.999    0.000    -223401.283    -125651.715 
        storey_34TO36    -180255.027     25383.025       -0.065     -7.101    0.000    -230008.635    -130501.419 
        storey_37TO39    -114023.613     25207.157       -0.043     -4.523    0.000    -163432.500     -64614.727 
        storey_40TO42     -76306.689     26324.957       -0.024     -2.899    0.004    -127906.592     -24706.786 
------------------------------------------------------------------------------------------------------------------

From the above, the adjusted R-squared for the revised multiple linear regression model is 0.719.

7.1.3 Checks for multicolinearity

The following code chunk test for sign of multicolinearity using ols_vif_tol function of olsrr package.

Show code
ols_vif_tol(hdb_final_mlr1)

There are some independent variables with VIF value above 10 which means that there are sign of multicollinearity. Hence we would want to calibrate the model again by removing independent variables with VIF above 10. Independent variables to be removed includes storey_01TO03, storey_04TO06, storey_07TO09, storey_10TO12, storey_13TO15, storey_16TO18, storey_19TO21, storey_22TO24, storey_25TO27 and storey_28TO30.

Revise the model

The following code chunk:

Show code
hdb_final_mlr2 <- lm(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER + PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M + NUM_PRISCH_1KM + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42, data=hdb_final_sf)

Checks for multicolinearity again

Show code
ols_vif_tol(hdb_final_mlr2)
               Variables Tolerance      VIF
1         floor_area_sqm 0.8922755 1.120730
2   remaining_lease_year 0.8230281 1.215025
3               PROX_CBD 0.4680141 2.136688
4       PROX_ELDERLYCARE 0.8642598 1.157060
5            PROX_HAWKER 0.8407733 1.189381
6            PROX_MRTLRT 0.8069432 1.239245
7             PROX_PARKS 0.8287724 1.206604
8      PROX_SUPERMARKETS 0.8720835 1.146679
9  NUM_KINDERGARDEN_350M 0.8840377 1.131173
10    NUM_CHILDCARE_350M 0.7870752 1.270527
11      NUM_BUSSTOP_350M 0.8925807 1.120347
12        NUM_PRISCH_1KM 0.5821821 1.717676
13         storey_31TO33 0.9867316 1.013447
14         storey_34TO36 0.9874545 1.012705
15         storey_37TO39 0.9851150 1.015110
16         storey_40TO42 0.9902039 1.009893

From the above, We can conclude that there are no sign of multicollinearity among the independent variables since the VIF value are below 10.

Ordinary least squares regression

The following code chunk run ols_regress function of olsrr package for the new model hdb_final_mlr2

Show code
ols_regress(hdb_final_mlr2)
                            Model Summary                              
----------------------------------------------------------------------
R                       0.825       RMSE                    67836.494 
R-Squared               0.681       Coef. Var                  15.645 
Adj. R-Squared          0.681       MSE                4601789951.266 
Pred R-Squared          0.681       MAE                     52876.816 
----------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                     ANOVA                                       
--------------------------------------------------------------------------------
                    Sum of                                                      
                   Squares           DF       Mean Square       F          Sig. 
--------------------------------------------------------------------------------
Regression     1.56324e+14           16      9.770251e+12    2123.141    0.0000 
Residual      7.309483e+13        15884    4601789951.266                       
Total         2.294188e+14        15900                                         
--------------------------------------------------------------------------------

                                              Parameter Estimates                                               
---------------------------------------------------------------------------------------------------------------
                model          Beta    Std. Error    Std. Beta       t        Sig          lower         upper 
---------------------------------------------------------------------------------------------------------------
          (Intercept)    130966.555      8992.671                  14.564    0.000    113339.900    148593.210 
       floor_area_sqm      2696.456        80.210        0.159     33.617    0.000      2539.235      2853.677 
 remaining_lease_year      4457.430        46.172        0.477     96.539    0.000      4366.926      4547.933 
             PROX_CBD    -12837.514       138.113       -0.609    -92.949    0.000    -13108.231    -12566.796 
     PROX_ELDERLYCARE    -14858.147       873.307       -0.082    -17.014    0.000    -16569.927    -13146.367 
          PROX_HAWKER    -23114.663      1136.928       -0.099    -20.331    0.000    -25343.170    -20886.156 
          PROX_MRTLRT    -32911.196      1546.944       -0.106    -21.275    0.000    -35943.383    -29879.010 
           PROX_PARKS     -4179.392      1317.251       -0.016     -3.173    0.002     -6761.354     -1597.431 
    PROX_SUPERMARKETS    -32735.045      3661.137       -0.043     -8.941    0.000    -39911.288    -25558.801 
NUM_KINDERGARDEN_350M      7183.877       565.379        0.061     12.706    0.000      6075.669      8292.084 
   NUM_CHILDCARE_350M     -5197.532       307.576       -0.085    -16.898    0.000     -5800.416     -4594.649 
     NUM_BUSSTOP_350M       951.826       195.226        0.023      4.876    0.000       569.161      1334.491 
       NUM_PRISCH_1KM     -2707.816       459.614       -0.035     -5.891    0.000     -3608.712     -1806.920 
        storey_31TO33    162800.059     11394.750        0.064     14.287    0.000    140465.057    185135.061 
        storey_34TO36    156339.507     12475.395        0.056     12.532    0.000    131886.319    180792.695 
        storey_37TO39    221860.148     12094.346        0.083     18.344    0.000    198153.859    245566.437 
        storey_40TO42    259042.577     14544.218        0.080     17.811    0.000    230534.261    287550.892 
---------------------------------------------------------------------------------------------------------------

From the above, the adjusted R-squared for the revised multiple linear regression model is 0.681, lower than the initial revised model’s adjusted R-square value of 0.719.

7.1.4 Test for Non-Linearity

The following code chunk checks the linear relationsihp assumption using ols_plot_resid_fit function of olsrr.

Show code
ols_plot_resid_fit(hdb_final_mlr2)

From the above, most of the data point are scattered near the 0 line. Around the middle area with fitted value 0.00006, there are slightly more data points with residual above 0. In general, we can conclude that the relationships between the dependent variable and independent variables are linear.

7.1.5 Test for Normality Assumption

The following code chunk checks normality assumption using ols_plot_resid_hist function of oslrr package

Show code
ols_plot_resid_hist(hdb_final_mlr2)

From the above, we can see that the residual of the revised multiple linear regression model resembles a normal distribution.

7.1.6 Testing for Spatial Autocorrelation

To perform spatial autocorrelation test, we need to convert hdb_final_sf simple into a SpatialPointsDataFrame.

Export residuals

The following code chunk export the residuals of the multiple linear regression model we have build and save it as data frame

Show code
mlr.output <- as.data.frame(hdb_final_mlr2$residuals)

Combine data frame

The following code chunk combines the residual output data frame with hdb_final_sf data frame using cbind function of base package, and rename the residual column to MLR_RES.

Show code
hdb_final_res_sf <- cbind(hdb_final_sf, mlr.output) %>%
                        rename(`MLR_RES` = `hdb_final_mlr2.residuals`)

convert hdb_final_res_sf into SpatialPointsDataFrame

The following code chunk converts hdb_final_res_sf into SpatialPointsDataFrame using as_Spatial function of sf package.

Show code
hdb_final_sp <- as_Spatial(hdb_final_res_sf)
hdb_final_sp
class       : SpatialPointsDataFrame 
features    : 15901 
extent      : 11593.88, 42621.21, 28217.39, 48741.03  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 34
names       :          address, storey_range, resale_price, floor_area_sqm, remaining_lease_year, PROX_CBD, PROX_ELDERLYCARE, PROX_HAWKER, PROX_MRTLRT, PROX_PARKS, PROX_SUPERMARKETS, PROX_CLINICS, NUM_KINDERGARDEN_350M, NUM_CHILDCARE_350M, NUM_BUSSTOP_350M, ... 
min values  :   1 CHAI CHEE RD,     01 TO 03,       218000,             74,                   45,    1.813,     3.348979e-07,  0.03333586,  0.02204073, 0.04416432,      1.648702e-08, 4.555479e-09,                     0,                  0,                0, ... 
max values  : 9B BOON TIONG RD,     49 TO 51,      1186888,            138,                   97,   29.349,         3.301637,     2.86763,    2.130606,   2.413137,          1.571317,    0.8083327,                     7,                 20,               19, ... 

Distribution of residuals

The following code chunk plots the distribution of residuals on an interactive map with mpsz_svy21 layer.

Show code
tmap_mode("view")
tm_shape(mpsz_svy21)+
  tm_polygons()+
  tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_res_sf) +  
  tm_dots(col = "MLR_RES",
          alpha = 0.6,
          style="quantile") +
  tm_view(set.zoom.limits = c(10,15))
Show code
tmap_mode("plot")

From the above interactive plot, we can see that the residuals are not randomly distributed as there are many of the same color in some area. This means that there is a sign of spatial autocorrelation.

Moran’s I test

We can perform Moran’s I test to check if there is sign of spatial autocorrelation.

Compute distance-based weight matrix

The following code chunk first compute the distance-based weight matrix by using dnearneigh function of spdep. The lower distance bound is set to 0m, upper distance bound set to 1500m, and longlat set to FALSE which means euclidean distance.

Show code
nb <- dnearneigh(coordinates(hdb_final_sp), 0, 1500, longlat = FALSE)
summary(nb)
Neighbour list object:
Number of regions: 15901 
Number of nonzero links: 10214420 
Percentage nonzero weights: 4.039846 
Average number of links: 642.376 
Link number distribution:

   2   14   25   31   46   56   58   61   63   65   68   70   71   72 
   3   14   20    6   47    4    4    5    1    2    4    1   12    1 
  79   86   87   88   89   90   91   96   97   99  101  105  106  109 
  78    2    1    5    4    3    1    2    8    7    4    2    2    2 
 110  111  112  115  117  118  119  120  121  122  123  124  125  126 
   5    4    3    1    1   11    4    2   16    5   22    8   50   22 
 128  129  130  131  132  133  134  135  136  137  138  139  140  141 
  29    6    2   16    6    8    2    5    6   11    3   25    8    5 
 142  143  144  145  147  148  149  150  151  152  153  154  156  157 
   7    2   16   11   15    4    4    3    9   22   15   14    6    6 
 158  159  160  161  162  163  165  166  167  168  170  171  173  175 
   9    3    7   14    5    2   27    5   12    5    3   15   11    6 
 176  177  178  179  180  181  182  183  184  185  186  187  188  189 
  14    8    8   35   16    7   11   19   15    2   30   15   10    8 
 190  191  192  193  194  195  196  197  198  199  200  201  202  203 
   6   19    3    2    3   16    6   15   21   16   14   15   23    1 
 204  205  206  207  208  209  210  211  212  213  214  215  216  217 
  15   19    8    4   25    1    6    4    8   10    4   14    9    2 
 218  219  220  221  222  223  224  225  226  227  228  229  230  231 
   4   12   27   30   12    5   33    6    5   19   25    5   10    9 
 232  233  234  235  236  237  238  239  240  241  242  243  244  245 
   4   11   15   43   18    2   13    7   28   28    9   10   13   11 
 246  247  248  249  250  251  252  253  254  255  256  257  258  259 
   7   10   10   19   18   13    5   19   22   15    8   15   16    6 
 260  261  262  263  264  265  266  267  268  269  270  271  272  273 
   4   18   23    3   24   12    3   21   26    6    1   26   34   64 
 274  275  276  277  278  279  280  281  282  283  284  285  286  287 
  14   30   18    6   65    6   53   29   30   19   51   24   17   48 
 288  289  290  291  292  293  294  295  296  297  298  299  300  301 
  55   33   29   34   66   39   16   42    5   16   28   32   25   16 
 302  303  304  305  306  307  308  309  310  311  312  313  314  315 
  37   28   21   22   24   10   13    9    8   15    5   35   10   16 
 316  317  318  319  320  321  322  323  324  325  326  327  328  329 
  22    9   49   39   34   11   14   41   28   28   35   17   21   28 
 330  331  332  333  334  335  336  337  338  339  340  341  342  343 
  18   29   12   35   38   26    3   25   40   18   15   37   12   24 
 344  345  346  347  348  349  350  351  352  353  354  355  356  357 
  15   14   11   42    5   39   26   16   25   12   36   21   31   13 
 358  359  360  361  362  363  364  365  366  367  368  369  370  371 
  26   20   20   40    7   31   36   25   17   24   17   23   16   16 
 372  373  374  375  376  377  378  379  380  381  382  383  384  385 
  26   18   49   19   31   35   10   24   12    9   14   31    5   41 
 386  387  388  389  390  391  392  393  394  395  396  397  398  399 
  29   12   31   13   54    9   20   11   13   25   45   26   15   10 
 400  401  402  403  404  405  406  407  408  410  411  412  413  414 
  31   23   16   14   29    8   17   25   24   22   25   22   21   10 
 415  416  417  418  419  420  421  422  423  424  425  426  427  428 
  26   12   13   19   72   17   19   16   45   12   17   13   45   21 
 429  430  431  432  433  434  435  436  437  438  439  440  441  442 
  18   36   32   16   29   21   18   34   10   29   27   20   28    9 
 443  444  445  446  447  448  449  450  451  452  453  454  455  456 
   8   11   33   12   12   24   28   16   18   31   17   28   18    9 
 457  458  459  460  461  462  463  464  465  466  467  468  469  470 
  16   47   34   15   17   12   13   23   15    7   14   22    8   27 
 471  472  473  474  475  476  477  478  479  480  481  482  483  484 
  11    8   16   42    9   24   14   29   27   32   21    8   12   12 
 485  486  487  488  489  490  491  492  493  494  495  496  497  498 
  31    8   10   23    6   28    9    2   10    2   11   11   22    7 
 499  500  501  502  503  504  505  506  507  508  509  510  511  512 
  22    4   10    7   22   25   14   21   26   16   18   24   25   25 
 513  514  515  516  517  518  519  520  521  522  523  524  525  526 
  27    1   18   14    6    7    6    9   12   10   13   13   10   14 
 527  528  529  530  531  532  533  534  535  536  537  538  539  540 
  10   14   53   11   19   18    9    8   11   13    9    8    4    4 
 541  542  543  544  545  546  547  548  549  550  551  552  553  554 
   5    3   19   20   21   10    8    7    6   10   13    8   17    2 
 555  556  557  558  559  560  561  562  563  564  565  566  567  568 
   6    5   10   23   15   21   10    5   13   12   21   22   27   24 
 569  570  571  572  573  574  575  576  577  578  579  580  581  582 
   9   12   24   15   22    7   17   25    7   27   21   16    5    8 
 583  584  585  586  587  588  589  590  591  592  593  594  595  596 
   6   21    4   12    9    7   17   27   15   26   13   12   15   22 
 597  598  599  600  601  602  603  604  605  606  607  608  609  610 
  18    9    8   20   10   16    6    9   15    6   20    4   13   11 
 611  612  613  614  615  616  618  619  620  621  622  623  624  625 
  17   23   14    6   14   29   30    6    7   16    3    2   21    3 
 626  627  628  629  630  631  633  634  635  636  637  638  639  640 
  13   22   14   13    6    7    7   10    5   17   21   20    4   20 
 641  642  643  644  645  646  647  648  649  650  651  652  653  654 
  16   20    5   10   17   12   33    6    1   27    7    6   11    1 
 655  656  657  658  659  660  662  663  664  665  666  668  669  670 
   9   11    9    5   17    9    4    6   13    1    2    2   31    7 
 671  672  673  674  675  676  677  678  679  680  681  682  683  684 
  18   10   16    1   11    8    6   22    9    1   17    5    6   33 
 685  686  687  688  689  690  691  692  693  694  695  696  697  698 
   6   24    9    4    5    7    7   10   20   19    9   32    1   12 
 699  700  701  702  703  704  705  706  707  708  709  710  711  712 
  17    8   13    8    3    8   13   26    9   11    9    6    4    8 
 713  714  715  716  717  718  719  720  721  722  723  724  725  726 
   5    4   31   33    8    7    8   16   24    9   14    7    4    6 
 727  728  729  730  731  732  733  734  735  736  737  738  739  740 
   5    2    2    6    9   31    4    8    3   10    1    3    2    1 
 741  742  744  745  746  747  748  749  750  751  752  753  754  755 
   8   12   16   11   14    7   28   12   27   15   23   12    2   24 
 756  757  758  759  760  761  762  763  764  765  766  767  768  769 
  11    9    2   15    7   24    9    4    6   16    8   10    9    5 
 770  771  772  773  774  775  776  777  778  779  780  781  782  783 
  16   16    3   21   14   21    8   19   14   16   13    6   29    9 
 784  785  786  787  788  789  790  791  792  793  794  795  796  797 
  16   35    3   16   26   11   10   10   12   22   17   12    3   17 
 798  799  800  801  802  803  804  805  806  808  809  810  811  812 
  17    8    3   15   24   50   15    4   20   10    3   12   43    9 
 813  814  815  816  817  818  819  820  821  822  823  824  825  826 
  42   14    9   17   10   17    6   16   14   27   16   16    7   15 
 827  828  829  830  831  832  833  834  835  836  837  838  839  840 
  14   33   12    9   12   12    7    9    9   19   17   20    4   43 
 841  842  843  844  845  846  847  848  849  850  851  852  853  854 
  14   12    3    7   18    6   11   11    6   13   17   12   18   39 
 855  856  857  858  859  860  861  862  863  865  866  867  868  869 
  12   18    2   14   13    6   15   23    4   13   21   14   10   20 
 870  871  872  873  874  875  876  877  878  879  880  881  882  883 
  23   17   14    3    4   10    5    4    5    7    8    3   19   18 
 884  885  886  887  888  889  890  891  892  893  894  895  896  897 
  12    5    6    2   10    6    3   14   15   20    4    5    7    3 
 899  900  901  902  903  904  905  906  907  908  909  910  911  912 
  16    9   16   14    9   15    7   47    5    4    6    3   23   11 
 913  914  915  916  917  918  919  920  921  922  923  924  925  926 
  23    2    8    9    1    3    9   23    4    1   14   23    4    4 
 927  928  929  930  931  932  933  934  937  938  939  941  942  943 
   7   12    2    7    4    4    6   34    5   14    1    4    1    5 
 944  945  946  948  950  951  952  953  954  955  956  957  961  962 
  13    2   17    8    5    1    2    3   24    5    6    3    2    7 
 964  965  967  969  970  971  974  975  976  977  978  979  980  981 
   3   13   17    7   29    4    2    6    8    9   21    1   20    2 
 982  983  984  985  986  987  990  992  993  994  995  996  998  999 
  18   14    1    3   12    3    4   19    7    1    5    1    1    5 
1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1014 
   7    3   14    5    2   24    5    6    5    9    5    3    8    1 
1015 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 
   4    1   15   18    6   10   13    1   14    1    7   10    3    7 
1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1041 1042 1043 1045 
   8    1    3   11    1    3   13   18    2   10    3   24   22    1 
1046 1047 1048 1049 1050 1051 1053 1054 1055 1056 1060 1061 1062 1063 
   2   11   11   23    7   18    1    3    6    6   13   11   11    5 
1065 1066 1067 1069 1070 1073 1074 1075 1079 1080 1082 1085 1086 1089 
   7    6    7    2    5    6    6    2    1   10    5   11   13    8 
1090 1092 1093 1095 1096 1097 1098 1100 1102 1103 1104 1105 1106 1107 
   5   20    1    2    4    4    5   10   11    8    3    5    2    5 
1109 1111 1112 1113 1115 1117 1118 1119 1120 1122 1123 1125 1127 1129 
  14    1   11    6    4    7   19    7    2    6    6    5    8    5 
1131 1133 1135 1137 1138 1139 1140 1141 1143 1144 1145 1146 1147 1149 
   4    1   10   13    6    9   16    2   15   20    4    3   15   12 
1150 1151 1152 1153 1154 1156 1157 1158 1159 1160 1161 1162 1164 1165 
   6    8   11    1    6   11    7   19    4    6    9   27    6    3 
1166 1168 1169 1171 1172 1173 1175 1177 1178 1179 1180 1181 1182 1186 
  17    8    6    4   27   22   20   11    7   11    2    7    3    2 
1189 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1202 1203 1204 
  20   18    3    6   16   10   18    7    4    2   15    5   10   15 
1205 1206 1207 1208 1210 1212 1213 1215 1216 1217 1218 1221 1222 1223 
   4    2    4   10   10    5    2   12   19    3   14   13   13   11 
1224 1225 1226 1227 1228 1230 1231 1232 1233 1234 1235 1236 1237 1238 
   3   11   22   46   30   11    7    4    1    5   35   17   10    5 
1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1252 1253 1254 1255 
  14    4   12    2    3    8    9    1    4    1   11    2    8    2 
1256 1258 1260 1263 1264 1266 1268 1270 1271 1272 1273 1275 1278 1279 
   7    3    2    6    8    7    2    3    6    7   12    4    3    7 
1281 1283 1284 1286 1287 1288 1289 1291 1292 1293 1294 1295 1296 1297 
   9   11    8    8    8    3    5    3    8   12   36    1    1    4 
1298 1299 1300 1302 1304 1307 1308 1309 1312 1313 1314 1315 1317 1321 
   2    4    1    9    6    1    6    8    9    6   21    4   10   18 
1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 
   6    2   12    5    2   19    2    1    8    4    1    1    3    5 
1337 1338 1339 1340 1341 1345 1346 1347 1348 1349 1352 1353 1355 1356 
  16   15    7    9   12   18   20   19    5    5   26   13    3    5 
1357 1358 1360 1361 1363 1364 1365 1368 1369 1370 1371 1372 1373 1374 
  14    3    4   11    7   11   11   17    5    6    1    3    8   12 
1376 1377 1379 1381 1382 1383 1384 1385 1386 1387 1389 1390 1393 1394 
  10    3    5    7    3   26   10    5    5   17    7    9    9   10 
1396 1397 1399 1400 1401 1403 1405 1406 1408 1410 1412 1414 1415 1416 
   7    3    8   20    6    3   14   15    2   12    6    1    4    8 
1421 1422 1423 1424 1425 1426 1427 1431 1432 1435 1436 1437 1441 1443 
   4    1    2   10    2    3    1    2    3    1    4   18    1    1 
1444 1445 1452 1454 1459 1460 1462 1463 1466 1467 1470 1472 1473 1474 
   5    2   10    2    1    9    1   18    1    4    1    1    7    2 
1475 1476 1477 1479 1480 1486 1488 1493 1494 1495 1496 1497 1499 1508 
   2    1    2    1    2    8    3    6   17    3    2    8    3    1 
1519 1523 1527 1533 1534 1538 1539 1543 1545 1547 1548 1549 1551 1552 
   2    3    2   12    1    3    4    2    6    1    2    3    2    1 
1553 1558 1561 1564 1565 1567 1569 1570 1573 1574 1577 1579 1586 1587 
   1    3    4    2    9    3    2    7    2    4    1    2    1    1 
1588 1593 1597 1598 1603 1604 1605 1607 1609 1618 1619 1624 1626 1627 
   8    3    6    3    1    1    1    2    7    1    2    1    2    1 
1634 1638 1650 1654 1655 1656 1678 1690 1701 1708 1717 1727 1739 1743 
   2   16    1    3    1    3    2    1    1    5    1    1    3    1 
1747 1753 1764 1771 1776 1785 1795 1796 1808 1824 1859 1875 1917 1922 
   6    4    3    4    1    3    4    4    5    1    3    3    1    2 
1945 1971 
   5    4 
3 least connected regions:
2914 3384 9808 with 2 links
4 most connected regions:
1901 7609 10897 12072 with 1971 links

From the above, we can see that the three least connected regions are row 2914, 3384 and 9808 with 2 links each. The most connected regions are row 1901, 7609, 10897 and 12072 with 1971 links each.

Convert output neighbour into spatial weight

The following code chunk converts nb into spatial weight using nb2listw function of spdep package.

Show code
nb_lw <- nb2listw(nb, style = 'W')
summary(nb_lw)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 15901 
Number of nonzero links: 10214420 
Percentage nonzero weights: 4.039846 
Average number of links: 642.376 
Link number distribution:

   2   14   25   31   46   56   58   61   63   65   68   70   71   72 
   3   14   20    6   47    4    4    5    1    2    4    1   12    1 
  79   86   87   88   89   90   91   96   97   99  101  105  106  109 
  78    2    1    5    4    3    1    2    8    7    4    2    2    2 
 110  111  112  115  117  118  119  120  121  122  123  124  125  126 
   5    4    3    1    1   11    4    2   16    5   22    8   50   22 
 128  129  130  131  132  133  134  135  136  137  138  139  140  141 
  29    6    2   16    6    8    2    5    6   11    3   25    8    5 
 142  143  144  145  147  148  149  150  151  152  153  154  156  157 
   7    2   16   11   15    4    4    3    9   22   15   14    6    6 
 158  159  160  161  162  163  165  166  167  168  170  171  173  175 
   9    3    7   14    5    2   27    5   12    5    3   15   11    6 
 176  177  178  179  180  181  182  183  184  185  186  187  188  189 
  14    8    8   35   16    7   11   19   15    2   30   15   10    8 
 190  191  192  193  194  195  196  197  198  199  200  201  202  203 
   6   19    3    2    3   16    6   15   21   16   14   15   23    1 
 204  205  206  207  208  209  210  211  212  213  214  215  216  217 
  15   19    8    4   25    1    6    4    8   10    4   14    9    2 
 218  219  220  221  222  223  224  225  226  227  228  229  230  231 
   4   12   27   30   12    5   33    6    5   19   25    5   10    9 
 232  233  234  235  236  237  238  239  240  241  242  243  244  245 
   4   11   15   43   18    2   13    7   28   28    9   10   13   11 
 246  247  248  249  250  251  252  253  254  255  256  257  258  259 
   7   10   10   19   18   13    5   19   22   15    8   15   16    6 
 260  261  262  263  264  265  266  267  268  269  270  271  272  273 
   4   18   23    3   24   12    3   21   26    6    1   26   34   64 
 274  275  276  277  278  279  280  281  282  283  284  285  286  287 
  14   30   18    6   65    6   53   29   30   19   51   24   17   48 
 288  289  290  291  292  293  294  295  296  297  298  299  300  301 
  55   33   29   34   66   39   16   42    5   16   28   32   25   16 
 302  303  304  305  306  307  308  309  310  311  312  313  314  315 
  37   28   21   22   24   10   13    9    8   15    5   35   10   16 
 316  317  318  319  320  321  322  323  324  325  326  327  328  329 
  22    9   49   39   34   11   14   41   28   28   35   17   21   28 
 330  331  332  333  334  335  336  337  338  339  340  341  342  343 
  18   29   12   35   38   26    3   25   40   18   15   37   12   24 
 344  345  346  347  348  349  350  351  352  353  354  355  356  357 
  15   14   11   42    5   39   26   16   25   12   36   21   31   13 
 358  359  360  361  362  363  364  365  366  367  368  369  370  371 
  26   20   20   40    7   31   36   25   17   24   17   23   16   16 
 372  373  374  375  376  377  378  379  380  381  382  383  384  385 
  26   18   49   19   31   35   10   24   12    9   14   31    5   41 
 386  387  388  389  390  391  392  393  394  395  396  397  398  399 
  29   12   31   13   54    9   20   11   13   25   45   26   15   10 
 400  401  402  403  404  405  406  407  408  410  411  412  413  414 
  31   23   16   14   29    8   17   25   24   22   25   22   21   10 
 415  416  417  418  419  420  421  422  423  424  425  426  427  428 
  26   12   13   19   72   17   19   16   45   12   17   13   45   21 
 429  430  431  432  433  434  435  436  437  438  439  440  441  442 
  18   36   32   16   29   21   18   34   10   29   27   20   28    9 
 443  444  445  446  447  448  449  450  451  452  453  454  455  456 
   8   11   33   12   12   24   28   16   18   31   17   28   18    9 
 457  458  459  460  461  462  463  464  465  466  467  468  469  470 
  16   47   34   15   17   12   13   23   15    7   14   22    8   27 
 471  472  473  474  475  476  477  478  479  480  481  482  483  484 
  11    8   16   42    9   24   14   29   27   32   21    8   12   12 
 485  486  487  488  489  490  491  492  493  494  495  496  497  498 
  31    8   10   23    6   28    9    2   10    2   11   11   22    7 
 499  500  501  502  503  504  505  506  507  508  509  510  511  512 
  22    4   10    7   22   25   14   21   26   16   18   24   25   25 
 513  514  515  516  517  518  519  520  521  522  523  524  525  526 
  27    1   18   14    6    7    6    9   12   10   13   13   10   14 
 527  528  529  530  531  532  533  534  535  536  537  538  539  540 
  10   14   53   11   19   18    9    8   11   13    9    8    4    4 
 541  542  543  544  545  546  547  548  549  550  551  552  553  554 
   5    3   19   20   21   10    8    7    6   10   13    8   17    2 
 555  556  557  558  559  560  561  562  563  564  565  566  567  568 
   6    5   10   23   15   21   10    5   13   12   21   22   27   24 
 569  570  571  572  573  574  575  576  577  578  579  580  581  582 
   9   12   24   15   22    7   17   25    7   27   21   16    5    8 
 583  584  585  586  587  588  589  590  591  592  593  594  595  596 
   6   21    4   12    9    7   17   27   15   26   13   12   15   22 
 597  598  599  600  601  602  603  604  605  606  607  608  609  610 
  18    9    8   20   10   16    6    9   15    6   20    4   13   11 
 611  612  613  614  615  616  618  619  620  621  622  623  624  625 
  17   23   14    6   14   29   30    6    7   16    3    2   21    3 
 626  627  628  629  630  631  633  634  635  636  637  638  639  640 
  13   22   14   13    6    7    7   10    5   17   21   20    4   20 
 641  642  643  644  645  646  647  648  649  650  651  652  653  654 
  16   20    5   10   17   12   33    6    1   27    7    6   11    1 
 655  656  657  658  659  660  662  663  664  665  666  668  669  670 
   9   11    9    5   17    9    4    6   13    1    2    2   31    7 
 671  672  673  674  675  676  677  678  679  680  681  682  683  684 
  18   10   16    1   11    8    6   22    9    1   17    5    6   33 
 685  686  687  688  689  690  691  692  693  694  695  696  697  698 
   6   24    9    4    5    7    7   10   20   19    9   32    1   12 
 699  700  701  702  703  704  705  706  707  708  709  710  711  712 
  17    8   13    8    3    8   13   26    9   11    9    6    4    8 
 713  714  715  716  717  718  719  720  721  722  723  724  725  726 
   5    4   31   33    8    7    8   16   24    9   14    7    4    6 
 727  728  729  730  731  732  733  734  735  736  737  738  739  740 
   5    2    2    6    9   31    4    8    3   10    1    3    2    1 
 741  742  744  745  746  747  748  749  750  751  752  753  754  755 
   8   12   16   11   14    7   28   12   27   15   23   12    2   24 
 756  757  758  759  760  761  762  763  764  765  766  767  768  769 
  11    9    2   15    7   24    9    4    6   16    8   10    9    5 
 770  771  772  773  774  775  776  777  778  779  780  781  782  783 
  16   16    3   21   14   21    8   19   14   16   13    6   29    9 
 784  785  786  787  788  789  790  791  792  793  794  795  796  797 
  16   35    3   16   26   11   10   10   12   22   17   12    3   17 
 798  799  800  801  802  803  804  805  806  808  809  810  811  812 
  17    8    3   15   24   50   15    4   20   10    3   12   43    9 
 813  814  815  816  817  818  819  820  821  822  823  824  825  826 
  42   14    9   17   10   17    6   16   14   27   16   16    7   15 
 827  828  829  830  831  832  833  834  835  836  837  838  839  840 
  14   33   12    9   12   12    7    9    9   19   17   20    4   43 
 841  842  843  844  845  846  847  848  849  850  851  852  853  854 
  14   12    3    7   18    6   11   11    6   13   17   12   18   39 
 855  856  857  858  859  860  861  862  863  865  866  867  868  869 
  12   18    2   14   13    6   15   23    4   13   21   14   10   20 
 870  871  872  873  874  875  876  877  878  879  880  881  882  883 
  23   17   14    3    4   10    5    4    5    7    8    3   19   18 
 884  885  886  887  888  889  890  891  892  893  894  895  896  897 
  12    5    6    2   10    6    3   14   15   20    4    5    7    3 
 899  900  901  902  903  904  905  906  907  908  909  910  911  912 
  16    9   16   14    9   15    7   47    5    4    6    3   23   11 
 913  914  915  916  917  918  919  920  921  922  923  924  925  926 
  23    2    8    9    1    3    9   23    4    1   14   23    4    4 
 927  928  929  930  931  932  933  934  937  938  939  941  942  943 
   7   12    2    7    4    4    6   34    5   14    1    4    1    5 
 944  945  946  948  950  951  952  953  954  955  956  957  961  962 
  13    2   17    8    5    1    2    3   24    5    6    3    2    7 
 964  965  967  969  970  971  974  975  976  977  978  979  980  981 
   3   13   17    7   29    4    2    6    8    9   21    1   20    2 
 982  983  984  985  986  987  990  992  993  994  995  996  998  999 
  18   14    1    3   12    3    4   19    7    1    5    1    1    5 
1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1014 
   7    3   14    5    2   24    5    6    5    9    5    3    8    1 
1015 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 
   4    1   15   18    6   10   13    1   14    1    7   10    3    7 
1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1041 1042 1043 1045 
   8    1    3   11    1    3   13   18    2   10    3   24   22    1 
1046 1047 1048 1049 1050 1051 1053 1054 1055 1056 1060 1061 1062 1063 
   2   11   11   23    7   18    1    3    6    6   13   11   11    5 
1065 1066 1067 1069 1070 1073 1074 1075 1079 1080 1082 1085 1086 1089 
   7    6    7    2    5    6    6    2    1   10    5   11   13    8 
1090 1092 1093 1095 1096 1097 1098 1100 1102 1103 1104 1105 1106 1107 
   5   20    1    2    4    4    5   10   11    8    3    5    2    5 
1109 1111 1112 1113 1115 1117 1118 1119 1120 1122 1123 1125 1127 1129 
  14    1   11    6    4    7   19    7    2    6    6    5    8    5 
1131 1133 1135 1137 1138 1139 1140 1141 1143 1144 1145 1146 1147 1149 
   4    1   10   13    6    9   16    2   15   20    4    3   15   12 
1150 1151 1152 1153 1154 1156 1157 1158 1159 1160 1161 1162 1164 1165 
   6    8   11    1    6   11    7   19    4    6    9   27    6    3 
1166 1168 1169 1171 1172 1173 1175 1177 1178 1179 1180 1181 1182 1186 
  17    8    6    4   27   22   20   11    7   11    2    7    3    2 
1189 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1202 1203 1204 
  20   18    3    6   16   10   18    7    4    2   15    5   10   15 
1205 1206 1207 1208 1210 1212 1213 1215 1216 1217 1218 1221 1222 1223 
   4    2    4   10   10    5    2   12   19    3   14   13   13   11 
1224 1225 1226 1227 1228 1230 1231 1232 1233 1234 1235 1236 1237 1238 
   3   11   22   46   30   11    7    4    1    5   35   17   10    5 
1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1252 1253 1254 1255 
  14    4   12    2    3    8    9    1    4    1   11    2    8    2 
1256 1258 1260 1263 1264 1266 1268 1270 1271 1272 1273 1275 1278 1279 
   7    3    2    6    8    7    2    3    6    7   12    4    3    7 
1281 1283 1284 1286 1287 1288 1289 1291 1292 1293 1294 1295 1296 1297 
   9   11    8    8    8    3    5    3    8   12   36    1    1    4 
1298 1299 1300 1302 1304 1307 1308 1309 1312 1313 1314 1315 1317 1321 
   2    4    1    9    6    1    6    8    9    6   21    4   10   18 
1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 
   6    2   12    5    2   19    2    1    8    4    1    1    3    5 
1337 1338 1339 1340 1341 1345 1346 1347 1348 1349 1352 1353 1355 1356 
  16   15    7    9   12   18   20   19    5    5   26   13    3    5 
1357 1358 1360 1361 1363 1364 1365 1368 1369 1370 1371 1372 1373 1374 
  14    3    4   11    7   11   11   17    5    6    1    3    8   12 
1376 1377 1379 1381 1382 1383 1384 1385 1386 1387 1389 1390 1393 1394 
  10    3    5    7    3   26   10    5    5   17    7    9    9   10 
1396 1397 1399 1400 1401 1403 1405 1406 1408 1410 1412 1414 1415 1416 
   7    3    8   20    6    3   14   15    2   12    6    1    4    8 
1421 1422 1423 1424 1425 1426 1427 1431 1432 1435 1436 1437 1441 1443 
   4    1    2   10    2    3    1    2    3    1    4   18    1    1 
1444 1445 1452 1454 1459 1460 1462 1463 1466 1467 1470 1472 1473 1474 
   5    2   10    2    1    9    1   18    1    4    1    1    7    2 
1475 1476 1477 1479 1480 1486 1488 1493 1494 1495 1496 1497 1499 1508 
   2    1    2    1    2    8    3    6   17    3    2    8    3    1 
1519 1523 1527 1533 1534 1538 1539 1543 1545 1547 1548 1549 1551 1552 
   2    3    2   12    1    3    4    2    6    1    2    3    2    1 
1553 1558 1561 1564 1565 1567 1569 1570 1573 1574 1577 1579 1586 1587 
   1    3    4    2    9    3    2    7    2    4    1    2    1    1 
1588 1593 1597 1598 1603 1604 1605 1607 1609 1618 1619 1624 1626 1627 
   8    3    6    3    1    1    1    2    7    1    2    1    2    1 
1634 1638 1650 1654 1655 1656 1678 1690 1701 1708 1717 1727 1739 1743 
   2   16    1    3    1    3    2    1    1    5    1    1    3    1 
1747 1753 1764 1771 1776 1785 1795 1796 1808 1824 1859 1875 1917 1922 
   6    4    3    4    1    3    4    4    5    1    3    3    1    2 
1945 1971 
   5    4 
3 least connected regions:
2914 3384 9808 with 2 links
4 most connected regions:
1901 7609 10897 12072 with 1971 links

Weights style: W 
Weights constants summary:
      n        nn    S0      S1       S2
W 15901 252841801 15901 79.4628 64105.37

Perform Moran’s I test

The following code chunk performs Moran’s I test using lm.morantest function of spdep package. It compares the multiple linear regression and the spatial weight.

Show code
lm.morantest(hdb_final_mlr2, nb_lw)

    Global Moran I for regression residuals

data:  
model: lm(formula = resale_price ~ floor_area_sqm +
remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE +
PROX_HAWKER + PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS +
NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M
+ NUM_PRISCH_1KM + storey_31TO33 + storey_34TO36 +
storey_37TO39 + storey_40TO42, data = hdb_final_sf)
weights: nb_lw

Moran I statistic standard deviate = 752.22, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Observed Moran I      Expectation         Variance 
    3.847662e-01    -3.532062e-04     2.621183e-07 

From the above, the p-value is lower than 2.2e-16, which is lesser than alpha value of 0.05. Therefore, we reject the null hypothesis that residual are randomly distributed. In addition, the Observed Global Moran I test is 0.38 and it is greater than 0. This means that the residual resembles a clustering distribution.

8. Building Hedonic Pricing Models using GWmodel

In this section, we will build the Hedonic Pricing Models using Fixed and Adaptive GWR model, with the sames variable of the revised multiple linear regression model, hdb_final_mlr2.

8.1 Fixed Bandwidth GWR model

8.1.1 Computing fixed bandwidth

The following code chunk determines the optimal fixed bandwidth for the model using bw.gwr function of GWmodel package.

The following are the arguements: - Approach set as Cross-validation (“CV”) - Weighting function set as Guassian - Adaptive set as False since we are computing fixed bandwidth - longlat set as False to calculate Euclidean distance

Show code
bw.fixed <- bw.gwr(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER +
PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M +
NUM_PRISCH_1KM + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42, data=hdb_final_sp, approach="CV", kernel="gaussian", adaptive=FALSE, longlat=FALSE)

The output of the above code chunk will shown in screenshot format using the code chunk below because the run time is very long.

bw.fixed Screenshot

From the above screenshot, we can see that the recommended fixed bandwidth is 350.8655 metres.

8.1.2 GWModel method

The following code chunk calibrate the gwr model using gwr.basic function of GWmodel with fixed bandwidth and gaussian kernel.

Show code
gwr.fixed <- gwr.basic(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER +
PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M +
NUM_PRISCH_1KM + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42, data=hdb_final_sp, bw=bw.fixed, kernel="gaussian", adaptive=FALSE, longlat=FALSE)
Show code
gwr.fixed

The output of the above code chunk will shown in screenshot format using the code chunk below due to the long run time.

bw.fixed GWR Screenshot1
bw.fixed GWR Screenshot2
bw.fixed GWR Screenshot3

From the fixed bandwidth GWR model, the adjusted R-square value is 0.941, higher than the revised multiple linear regression models with adjusted R-square value of 0.681.

8.2 Adaptive Bandwidth GWR model

8.2.1 Computing adaptive bandwidth

The following code chunk determines the optimal adaptive bandwidth for the model using bw.gwr function of GWmodel package. It is similar to 8.1.1, the only difference is that the adaptive argument is set to TRUE since we are computing adaptive bandwidth.

Show code
bw.adaptive <- bw.gwr(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER +
PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M +
NUM_PRISCH_1KM + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42, data=hdb_final_sp, approach="CV", kernel="gaussian", adaptive=TRUE, longlat=FALSE)

The output of the above code chunk will shown in screenshot format using the code chunk below due to the long run time.

bw.adaptive Screenshot

From the screenshot above, we can see that the recommended adaptive bandwidth is 209 metres.

8.2.2 GWModel method

The following code chunk calibrate the gwr model using gwr.basic function of GWmodel with adaptive bandwidth and gaussian kernel.

Show code
gwr.adaptive <- gwr.basic(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER +
PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M +
NUM_PRISCH_1KM + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42, data=hdb_final_sp, bw=bw.adaptive, kernel="gaussian", adaptive=TRUE, longlat=FALSE)
Show code
gwr.adaptive

The output of the above code chunk will shown in screenshot format using the code chunk below because the run time is very long.

bw.adaptive GWR Screenshot1
bw.adaptive GWR Screenshot2

From the adaptive bandwidth GWR model, the adjusted R-square value is 0.914, higher than the revised multiple linear regression models with adjusted R-square value of 0.681, but lower than the fixed bandwidth GWR Model with adjusted R-square value of 0.941.

Comparing Multiple Linear Regression Model, Fixed Bandwidth GWR model and GWModel Adaptive Bandwidth GWR model, Fixed Bandwidth GWR model has the best adjusted R-square of 0.941. Hence, we will be using it to visualise GWR output in the next section.

9. Visualising GWR Output

9.1 Convert SDF into sf data frame

The following code chunk:

Show code
hdb_final_sf_fixed <- st_as_sf(gwr.fixed$SDF) %>%
                        st_transform(crs=3414)

The following code chunk display the columns in hdb_final_sf_fixed sf object using glimpse function of pillar package

Show code
glimpse(hdb_final_sf_fixed)

We will write the hdb_final_sf_fixed files into csv using st_write function of sf package and convert GEOMETRY into X and Y column using the layer_options argument

Show code
st_write(hdb_final_sf_fixed, "data/aspatial/hdb_final_sf_fixed.csv", layer_options = "GEOMETRY=AS_XY")

Import hdb_final_sf_fixed file for visualization

The following code chunk imports hdb_final_sf_fixed file using read_csv function.

Show code
hdb_final_sf_fixed <- read_csv("data/aspatial/hdb_final_sf_fixed.csv")

The following code chunk:

Show code
hdb_final_sf_fixed1 <- st_as_sf(hdb_final_sf_fixed,
                          coords = c("X", 
                                     "Y"),
                          crs=3414)

The following code chunk display the columns of hdb_final_sf_fixed1 sf object using glimpse function of pillar package

Show code
glimpse(hdb_final_sf_fixed1)
Rows: 15,901
Columns: 58
$ Intercept                <dbl> 67802.83, 79647.26, 93829.81, 94708~
$ floor_area_sqm           <dbl> 2011.414, 2039.657, 2081.750, 2083.~
$ remaining_lease_year     <dbl> 7426.111, 7509.791, 7696.751, 7698.~
$ PROX_CBD                 <dbl> -27583.35, -30471.34, -35646.17, -3~
$ PROX_ELDERLYCARE         <dbl> -130824.1, -122416.3, -104814.9, -1~
$ PROX_HAWKER              <dbl> -49960.47, -52496.02, -50256.52, -4~
$ PROX_MRTLRT              <dbl> -192957.1, -184646.8, -171212.2, -1~
$ PROX_PARKS               <dbl> 121851.95, 111663.05, 89809.93, 894~
$ PROX_SUPERMARKETS        <dbl> 158056.3, 139764.3, 103421.0, 10280~
$ NUM_KINDERGARDEN_350M    <dbl> -1032.7275, -1178.4605, 2209.5626, ~
$ NUM_CHILDCARE_350M       <dbl> -6674.598, -6097.525, -4752.532, -4~
$ NUM_BUSSTOP_350M         <dbl> 2852.045, 3961.953, 6780.329, 6866.~
$ NUM_PRISCH_1KM           <dbl> -25348.97, -29604.97, -41058.13, -4~
$ storey_31TO33            <dbl> 112603.13, 102415.79, 77395.72, 764~
$ storey_34TO36            <dbl> 127070.89, 116328.51, 91319.19, 904~
$ storey_37TO39            <dbl> 155193.7, 144864.5, 119865.8, 11894~
$ storey_40TO42            <dbl> 242479.7, 240244.5, 232135.0, 23221~
$ y                        <dbl> 435000, 370000, 440000, 470000, 488~
$ yhat                     <dbl> 417980.2, 372523.7, 478742.2, 48243~
$ residual                 <dbl> 17019.848, -2523.734, -38742.194, -~
$ CV_Score                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ Stud_residual            <dbl> 0.62086976, -0.09186767, -1.4043721~
$ Intercept_SE             <dbl> 57561.19, 57600.58, 59440.10, 59563~
$ floor_area_sqm_SE        <dbl> 370.8196, 378.9356, 402.4416, 403.7~
$ remaining_lease_year_SE  <dbl> 333.9225, 310.6055, 280.0002, 279.5~
$ PROX_CBD_SE              <dbl> 5001.791, 5064.560, 5279.709, 5288.~
$ PROX_ELDERLYCARE_SE      <dbl> 31581.57, 32498.18, 33719.39, 33776~
$ PROX_HAWKER_SE           <dbl> 35936.79, 34671.79, 32269.50, 32150~
$ PROX_MRTLRT_SE           <dbl> 21996.95, 21921.63, 22484.22, 22491~
$ PROX_PARKS_SE            <dbl> 28204.54, 28314.91, 28139.99, 28112~
$ PROX_SUPERMARKETS_SE     <dbl> 48331.83, 49738.75, 51183.75, 51181~
$ NUM_KINDERGARDEN_350M_SE <dbl> 5985.000, 5820.768, 5660.172, 5663.~
$ NUM_CHILDCARE_350M_SE    <dbl> 3413.160, 3418.766, 3440.792, 3443.~
$ NUM_BUSSTOP_350M_SE      <dbl> 2363.977, 2355.700, 2287.557, 2287.~
$ NUM_PRISCH_1KM_SE        <dbl> 11550.34, 11774.78, 12058.99, 12035~
$ storey_31TO33_SE         <dbl> 31783.61, 32267.39, 31542.69, 31497~
$ storey_34TO36_SE         <dbl> 32448.21, 32381.62, 31521.37, 31476~
$ storey_37TO39_SE         <dbl> 36378.21, 37868.71, 37598.74, 37563~
$ storey_40TO42_SE         <dbl> 35037.99, 34947.49, 34783.18, 34790~
$ Intercept_TV             <dbl> 1.1779260, 1.3827511, 1.5785609, 1.~
$ floor_area_sqm_TV        <dbl> 5.424239, 5.382594, 5.172800, 5.161~
$ remaining_lease_year_TV  <dbl> 22.23903, 24.17790, 27.48837, 27.53~
$ PROX_CBD_TV              <dbl> -5.514695, -6.016582, -6.751540, -6~
$ PROX_ELDERLYCARE_TV      <dbl> -4.142419, -3.766867, -3.108445, -3~
$ PROX_HAWKER_TV           <dbl> -1.390232, -1.514084, -1.557400, -1~
$ PROX_MRTLRT_TV           <dbl> -8.771992, -8.423043, -7.614769, -7~
$ PROX_PARKS_TV            <dbl> 4.320295, 3.943614, 3.191541, 3.180~
$ PROX_SUPERMARKETS_TV     <dbl> 3.270232, 2.809968, 2.020582, 2.008~
$ NUM_KINDERGARDEN_350M_TV <dbl> -0.17255263, -0.20245792, 0.3903702~
$ NUM_CHILDCARE_350M_TV    <dbl> -1.955548, -1.783546, -1.381232, -1~
$ NUM_BUSSTOP_350M_TV      <dbl> 1.2064606, 1.6818583, 2.9640046, 3.~
$ NUM_PRISCH_1KM_TV        <dbl> -2.194651, -2.514269, -3.404774, -3~
$ storey_31TO33_TV         <dbl> 3.542805, 3.173972, 2.453682, 2.428~
$ storey_34TO36_TV         <dbl> 3.916115, 3.592424, 2.897057, 2.872~
$ storey_37TO39_TV         <dbl> 4.266116, 3.825439, 3.188027, 3.166~
$ storey_40TO42_TV         <dbl> 6.920481, 6.874441, 6.673772, 6.674~
$ Local_R2                 <dbl> 0.9287239, 0.9310459, 0.9384631, 0.~
$ geometry                 <POINT [m]> POINT (30942.4 33819.57), POI~

From the above, we can see the gwr outputs such as Local_R2, residuals, coefficient standard error, and observed and predicted y values. We will visualise them in the next section.

The following code chunk checks the statistics of the predicted value yhat using summary function

Show code
summary(hdb_final_sf_fixed1$yhat)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 236602  354981  406736  433767  465594 1077240 

From the summary statistics, the lowest predicted price for 4-room hdb flat is 236602 and the highest predicted price is 1077240. The average is 433767.

9.2 Visualization and Summary Statistics

9.2.1 Local_R2

Visualization

The following code chunk plots the interactive point maps of Local_R2 using tmap package function.

Show code
tmap_mode("view")
tm_shape(mpsz_svy21)+
  tm_polygons() +
  tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_sf_fixed1) +  
  tm_dots(col = "Local_R2",
          border.col = "gray60",
          border.lwd = 1) +
  tm_view(set.zoom.limits = c(10,15))
Show code
tmap_mode("plot")

From the Local_R2 map, we can see that in general, the fixed GWR model predicts well as shown by many red points on the map. Region such as North-east and East are predicted poorly by the fixed GWR model as there are a few yellow points and quite a number of orange point in these two region, and this means that the Local_R2 value is low and range from 0.2 to 0.4, and 0.6 to 0.8 respectively. We would want to analyse these region further.

Summary statistics

The following code chunk checks the statistics of Local_R2 using summary function

Show code
summary(hdb_final_sf_fixed1$Local_R2)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2329  0.8321  0.9152  0.8661  0.9560  1.0000 

From the summary statistics, the min is 0.2329 and the max is 1. The 1st quartile and median is already 0.8321 and 0.915 respectively which is high. The mean is also high at 0.8661. This shows that the Local_R2 are high in general and have some low values as seen from the large difference between the min and first quartile.

9.2.1.1 Local_R2 by planning area

North-east

The following code chunk plots an interactive map within the north-east planning area.

Show code
tmap_mode("view")
tm_shape(mpsz_svy21[mpsz_svy21$REGION_N=="NORTH-EAST REGION", ])+
  tm_polygons() +
  tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_sf_fixed1) +  
  tm_dots(col = "Local_R2",
          border.col = "gray60",
          border.lwd = 1) +
  tm_view(set.zoom.limits = c(10,15))
Show code
tmap_mode("plot")

In the North-east region, we can see that there is particularly one area with many yellow points which means these area are poorly predicted by the fixed GWR model. If we toggle the mpsz_svy21 layer off and turn on open street map view, we can see that the area is Fernvale. There are also two areas with many orange points. From the open street map view, the areas are Damai, Oasis and Kadaloor in Punggol, and Sengkang South.

East

The following code chunk plots an interactive map within the East planning area.

Show code
tmap_mode("view")
tm_shape(mpsz_svy21[mpsz_svy21$REGION_N=="EAST REGION", ])+
  tm_polygons() +
  tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_sf_fixed1) +  
  tm_dots(col = "Local_R2",
          border.col = "gray60",
          border.lwd = 1) +
  tm_view(set.zoom.limits = c(10,15))
Show code
tmap_mode("plot")

In the east planning area, there are two areas with many yellow and orange points. If we toggle the mpsz_svy21 layer off and turn OpenStreetMap on, we can see that these areas are Tampines North, Tampines East, and around Pasir Ris Dr 12.

9.2.2 Observed and Predicted Y

Visualization

The following code chunk plots the spatial point map of Observed and Predicted Y side by side using tmap_arrange function of tmap package.

Show code
ymap <- tm_shape(mpsz_svy21)+
          tm_polygons() +
          tmap_options(check.and.fix = TRUE) +
        tm_shape(hdb_final_sf_fixed1) +  
          tm_dots(col = "y",
                  border.col = "gray60",
                  border.lwd = 1) +
        tm_layout(title="Observed Y")

yhatmap <- tm_shape(mpsz_svy21)+
              tm_polygons() +
              tmap_options(check.and.fix = TRUE) +
            tm_shape(hdb_final_sf_fixed1) +  
              tm_dots(col = "yhat",
                      border.col = "gray60",
                      border.lwd = 1) +
        tm_layout(title="Predicted Y")
        
tmap_arrange(ymap, yhatmap)

From the comparison, we can see that it looks the same in general which means that the prediction by the fixed GWR model is good.

Summary statistics

The following code chunk checks the statistics of the predicted value yhat using summary function

Show code
summary(hdb_final_sf_fixed1$yhat)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 236602  354981  406736  433767  465594 1077240 

From the summary statistics, the lowest predicted price for 4-room hdb flat is 236602 and the highest predicted price is 1077240. The average is 433767. This shows that there are only some 4 hdb flats with extremely high predicted price of over 1000000 as seen in the large difference between the third quartile and max resale price.

9.2.3 Standardized Residual

Visualization

The following code chunk plots the spatial point map of standardized Residual.

Show code
tm_shape(mpsz_svy21)+
  tm_polygons() +
  tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_sf_fixed1) +  
  tm_dots(col = "Stud_residual",
          border.col = "gray60",
          border.lwd = 1) +
  tm_layout(legend.text.size = 1)

From the residual map, we can see that majority of the points are yellow or light green which means that the standardized residual is between -5 to 0, or between 0 to 5. We couldn’t really determine the range because the bin is huge. Hence, we will need to see the summary statistic.

Summary statistics

The following code chunk checks the statistics of the standardized residual using summary function

Show code
summary(hdb_final_sf_fixed1$Stud_residual)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-8.53156 -0.55717  0.03430 -0.00703  0.59431 11.84665 

From the summary statistics, the min is beyond -1 and the max is beyond 1 and they are outliers because of the huge difference from the 1st quartile and 3rd quartile respectively. However, the mean and median is almost near 0 which shows that majority of the prediction by the fixed GWR model are accurate with a low residual.